Downloaded from genome.cshlp.org on October 4, 2021 - Published by Cold Spring Harbor Laboratory Press

RESEARCH Identification and Chromosomal Localization of Human Genes Containing CAG/CTG Repeats Expressed in Testis and Brain

Fre´de´rique Bulle,1,3 Nuchanard Chiannilkulchai,2 Andre´Pawlak,1 Jean Weissenbach,2 Gabor Gyapay,2 and Georges Guellae¨n1

1Institut National de la Sante´et de la Recherche Me´dicale (INSERM), Unite´99, Hoˆpital Henri Mondor, 94010 Cre´teil, France; 2Centre National de la Recherche Scientifique (CNRS), URA 1922, Généthon, 91002 Evry CEDEX, France

Human genes containing triplet repeats have been demonstrated to be involved in several neurodegenerative diseases by expansion of the repeat in succeeding generations. To identify novel genes involved in such pathologies, we have isolated transcripts containing (CAG/CTG)n repeats using two approaches. First, we 6 screened 4 × 10 clones representing 10 copies of a human testis cDNA library using a (CAG)14 oligonucleotide probe. Among the 910 clones identified, the 243 clones with the strongest hybridization signal were sequenced partially from 3Ј or 5Ј ends. This provided us with 251 partial sequences that grouped into clusters corresponding to 39 genes, of which 19 represent unknown species. Second, we selected 203 additional ESTs containing (CAG/CTG)n repeats representing 121 clusters from the IMAGE consortium infant brain cDNA library. From these two series of sequences, we have localized 95 genes on human chromosomes using a panel of whole genome radiation hybrid (Genebridge 4). These genes are located on all of the chromosomes except for chromosome X, the highest density being observed on . [The sequence data described in this paper have been submitted to GenBank under accession nos. AA065241–AA065346.]

The human genome contains a large number of disease (HD) (The Huntington’s Disease Collabora- short tandem repeats (also known as microsatel- tive Research Group et al. 1993), dentato- lites), including trinucleotide repeats in stretches of pallidoluisian (DRPLA) (Koide et al. 1994; five or more, that have been detected in at least 50 Nagafuchi et al. 1994), Machado-Joseph disease or genes (Riggins et al. 1992). Expansions of various type 3 (MJD or SCA3) (Kawa- types of these trinucleotide repeats have been im- gushi et al. 1994), spinocerebellar ataxia type 2 plicated in genetic diseases. Although CGG and (SCA2) (Imbert et al. 1996; Pulst et al. 1996; Sanpei GAA repeats are expanded in different fragile X syn- et al. 1996), and spinocerebellar ataxia type 6 dromes (Kremer et al. 1991; Verkerk et al. 1991; Yu (SCA6), the last identified SCA associated with a et al. 1991; Knight et al. 1993; Jones et al. 1994; CAG expansion (Zhuchenko et al. 1997). All of Nancarrow et al. 1994; Parrish et al. 1994) and Frie- these diseases exhibit instability in transmission of dreich’s ataxia, respectively (Campuzano et al. the expanded repeat from parent to offspring. In 1996), CTG and CAG repeats are involved in a larger some of them, the increase in repeat size correlates series of pathologies. Amplification of CTG repeats with an increase in disease severity and a decrease in have been described in myotonic (MD) age of onset or penetrance as established for HD (Aslanidis et al. 1992; Brook et al. 1992; Fu et al. (Duyao et al. 1993) and MJD (Kawaguchi et al. 1992), whereas amplifications of CAG repeats have 1994). In addition, the expansion of the repeats oc- been observed in spinal and bulbar muscular atro- curs in the transcribed part of the gene; GAA repeats phy (SBMA) (La Spada et al. 1991), spinocerebellar are located in the intron, CGG and CTG in the un- ataxia type 1 (SCA1) (Orr et al. 1993), Huntington’s translated exons, and CAG in the coding exons of the related gene.

3Corresponding author. The study of CAG repeats is of special interest E-MAIL [email protected]; FAX 33-1-48-98-09-08. for at least three reasons: (1) As mentioned previ-

7:705–715 ©1997 by Cold Spring Harbor Laboratory Press ISSN 1054-9803/97 $5.00 GENOME RESEARCH 705 Downloaded from genome.cshlp.org on October 4, 2021 - Published by Cold Spring Harbor Laboratory Press

BULLE ET AL. ously, this type of repeat is involved in at least six neurodegenerative diseases; (2) CAG re- peats are translated into polyglutamine stretches, domains that are often present in transcription factors and may function as a po- lar zipper interacting with other proteins (Pe- rutz et al. 1994); and (3) experiments in Esch- erichia coli have shown that CAG/CTG tracts are expanded at least eight times more fre- quently than any of the other nine triplets (Ohshima et al. 1996). Therefore, the identification and mapping of genes containing CAG repeats are of impor- tance as CAG repeats represent potential can- didates for diseases that exhibit genetic antici- pation (La Spada et al. 1994), such as unipolar and bipolar disorders (McInnis et al. 1993; Eng- strom et al. 1995; O’Donovan et al. 1995), au- tosomal dominant cerebellar ataxia (ADCA) type I (Durr et al. 1996) and type II (Benomar et al. 1995), familial nonspecific dementia (Brown et al. 1995), and schizophrenia (Ross et al. 1993; Bassett and Honer 1994; Morris et al. 1995; O’Donovan et al. 1995; Bowen et al. 1996), although is still question- able for this last disease (Petronis at al. 1996; Figure 1 Schematic outline of the strategy used to identify Sasaki et al. 1996). and map new CAG-containing transcripts. To aid in the identification of new genes containing CAG/CTG repeats, we decided to look for transcripts containing such repeats in human testis. This tissue expresses a large number Isolation of Human Testis mRNA-Containing CAG of mRNAs, many of which are shared exclusively Repeat (Group A) with nervous tissue, such as neuropeptide precur- -clones from a human tes 106 ן sors, proenkephalin, or pro-opiomelanocortin The screening of 4

(Wolgemuth and Watrin 1991), or belong to neuro- tis cDNA library, using a (CAG)14 as a probe, pro- transmitter biosynthesis (glutamate decarboxylase) duced 910 positive clones. We selected 243 clones (Persson et al. 1990). In the present work, the eliciting the strongest signal and sequenced them screening of a human testis cDNA library with a from their 3Ј end, as this region is likely to corre- CAG-specific probe resulted in the identification of spond to a single exon (Hawkins et al. 1988) and, 39 CAG-containing genes, 19 of them correspond- therefore, is more suitable to derive sequence tag ing to new genes. In parallel, we analyzed expressed sites (STSs) for mapping purposes (Hayes et al. sequence tags (ESTs) from the IMAGE consortium 1996). After the analysis of these sequences, we re- obtained from infant brain cDNA clones positive jected all clones lacking a poly(A) tail and we se- for CAG hybridization. From this analysis, we col- quenced the 5Ј end of the insert when no repeat was lected 121 CAG-containing clusters. From these detected in the 3Ј end sequence or when the se- two pools of transcripts, we have mapped 95 CAG/ quence in the 3Ј end was not informative enough. CTG-containing genes using radiation hybrid map- Of 243 cDNA clones, we obtained 251 sequences ping. from the 3Ј or 5Ј ends that ranged between 200 and 400 bp with an average size at 380 bp. These se- quences have been deposited in dbEST with AC ac- RESULTS cession numbers AA065241–AA065346. These sequences were submitted to two types of The strategy for the analysis of sequences contain- analysis. First, they were compared among them- ing CAG repeat is outlined in Figure 1. selves to eliminate exact duplicates. This analysis

706 GENOME RESEARCH Downloaded from genome.cshlp.org on October 4, 2021 - Published by Cold Spring Harbor Laboratory Press

MAPPING OF HUMAN GENES CONTAINING CAG/CTG REPEATS led us to discard 146 sequences (58%). We kept image/itri.htlm). From this series of clones, 203 se- overlapping sequences as well as identical sequences quences (350–500 bp), from either the 3Ј or 5Ј re- containing repeats of various sizes. The remaining gion of the insert (mean 1820 bp) were recovered 105 sequences (42%) were assembled into 39 inde- from dbEST. These sequences were assembled into pendent clusters (group A) (Table 1). Sequences 121 clusters according to the same strategy as the were incorporated in the same cluster when they one described for the human testis cDNA clones exhibited at least 98% identity in nucleic acid se- (Fig. 1). quence. Second, the sequences corresponding to these 39 clusters were then compared with se- quences present in nucleic and proteic sequence da- Chromosomal Localization of cDNA Clones and tabases using BLAST programs (Altschul et al. 1990): ESTs 12 clusters corresponded to genes already known in The human testis CAG/CTG containing clones and human, 8 clusters were found to be homologous to the human ESTs were localized using a radiation known genes in human or in other species, 16 clus- hybrid panel. In this technique, segments of human ters only matched with anonymous ESTs, and 3 chromosomes obtained by X-ray irradiation are res- clusters did not give any match at all. cued in rodent recipient cells. A linkage distance can Analysis of the repeats present in the sequences be established on the basis of the scoring of the revealed that 21 clusters exhibit a CAG or CTG re- number of breaks between two loci by measuring peat located either in 3Ј or in 5Ј region of the cDNA. the frequency of coretention (Cox et al. 1990). In One-third of these repeats contained three to nine our case the CAG/CTG-containing clones were triplets, the remaining two-thirds had between 10 mapped by using 90 hybrids from the Genebridge 4 and 20 triplets, and three cases had >20 repeats. In radiation hybrid panel (Gyapay et al. 1996). In gen- 15 clusters, we observed insertions of CAA or TTG eral, for each cluster only one sequence located pref- triplet in the CAG or CTG repeats, respectively, thus erentially at the 3Ј end was retained for primer de- extending the stretch of glutamine. In six clusters, sign. For some infant brain EST clusters, the local- we observed small insertions of nine bases or less in ization was achieved by using oligonucleotides the CAG repeat. Two variations in the size of the derived from the same cluster of ESTs of the CAG repeat were observed; 13, 15, and 17 CAG repeats positive clone present in Unigene (Schuler et al. are present in the different cDNAs of the cluster 14, 1996). as well as 10 and 11 CAG repeats in cluster 25. In 18 Of the 39 genes expressed in human testis that clusters, we did not detect any repeat in the partial we analyzed, 27 (69%) were localized (Table 2), and sequences that were obtained. Nevertheless, it re- of 121 EST clusters derived from the brain cDNA mains likely that these clones contain CAG repeats. library, 68 (57%) were mapped successfully (Table This statement is based on the fact that 4 clusters 3). Several clusters could not be localized for differ- among 18, corresponding to already known human ent reasons: (1) primers failed to amplify (majority genes, identify genes containing CAG repeats (e.g., of cases); (2) presence of background from hamster monocyte differentiation antigen precursor, myo- DNA; (3) human bands had a different size from tonic dystrophy , nucleolar phosphoprotein that expected; and (4) sequences were too short for p130, human 54-kD protein mRNA). Only one clus- primer determination. ter matches to a human gene that contains a CAG- rich region with no perfect successive repeats (e.g., human XRCC4 mRNA). The complete sequence of DISCUSSION these cDNAs will definitely prove the presence of the CAG repeat. In the present study we used cDNA sequences ob- tained by two different approaches to identify and Selection of Human EST-Containing CAG Repeat map new human genes containing CAG repeats. 6 ן (Group B) The screening of 4 10 clones, equivalent to 10 copies of a human testis cDNA library, allowed us The IMAGE consortium screened a subset of 40,000 to identify 910 clones containing CAG repeats. The clones from the normalized infant brain cDNA li- 243 clones that exhibited the strongest hybridiza- brary 1 (NIB1) of B. Soares, using CAG oligonucleo- tion signals were further characterized. When dupli- tides. One hundred eighty-six positive clones likely cates were eliminated, these clones appear to corre- to contain CAG were obtained and listed in the IM- spond to 39 genes containing CAG repeats. With AGE web page (http://www.bio.llnl.gov/bbrp/ respect to the number of plated clones, this repre-

GENOME RESEARCH 707 Downloaded from genome.cshlp.org on October 4, 2021 - Published by Cold Spring Harbor Laboratory Press

BULLE ET AL.

Table 1. Summary of Testis cDNA Sequences

708 GENOME RESEARCH Downloaded from genome.cshlp.org on October 4, 2021 - Published by Cold Spring Harbor Laboratory Press

MAPPING OF HUMAN GENES CONTAINING CAG/CTG REPEATS

Table 1. (Continued)

(C) The cluster number; (name) the name of the clone, with the last digit indicating the primer used for 21M13; r: M13 reverse); (description) a short name of the sequenceמ :the sequence (s: SP6; t: T7; m hit with its accession number in the column ref. database; (% id; % sim) the percentages of identity and similarity for the alignment, respectively. DNA matches are indicated by a blank in the percentage similarity field. (S) The species of the homologous sequence (H: human, M: mouse, Y: yeast); (repeat) the number of CAG or CTG repeats that were detected in the sequence (+: an interruption inside the repeat; > or <: the repeat is localized at the beginning or at the end of the sequence, respectively); [(rep. ins.) repeat insertion] the number of CAA or TTG inserted in the CAG or CTG repeat.

sents roughly 1 in 10,000 (39 in 400,000 for one mentioned lower numbers in CAG repeats. Gastier copy of the library). This number is lower than the et al. (1996), analyzing the (CAG/CTG)n repeat ratio of 37 in 10,000 reported by Ne´ri et al. (1996) lengths in 479 unique genomic clones, observed from a human fetal brain cDNA library, or 28 in 30% of the repeats with six triplets, whereas the 10,000 from a human cerebral cortex cDNA library repeats with 13 copies represented only 2%. In hu- (Li et al. 1993), and 7 in 10,000 from another fetal man fetal brain cDNA, Ne´ri et al. (1996) observed brain library (Riggins et al. 1992). Such differences only 13 of 88 (15%) clones that exhibited repeats of might result from the differential expression of tran- size above nine. The larger size repeats that we ob- scripts between testis and brain, but also from dif- served result from our selection of the clones with fering experimental conditions. In the series that we the highest hybridization signal as mentioned analyzed, we selected clones with intense hybridiza- above. tion signals. In addition, preliminary tests that we In most of the clusters, the CAG repeats were performed on 26 of the remaining clones (667) that not perfect. First, in 15 clusters, we observed the gave low hybridization signals, allowed us to iden- presence of CAA triplets in the CAG repeat. This tify new transcripts that contain 6–11 CAGs. Thus, triplet also encodes glutamine and is also present in the population of transcripts that contain CAG re- genes for HD, DRPLA, SBMA, SCA2, and MJD1. Sec- peats in normal human testis is certainly larger than ond, for six clusters, we detected small insertions our initial observations indicate. In addition, at with sizes between 3 and 9 . For five of least in our study, the intensity of the hybridization those clusters, the insertion did not change the pos- signal correlates more or less with the number of sible open reading frame (ORF). Similar insertions repetitions. were found in the gene of SCA1, SBMA, and MJD1. The average size of the CAG/CTG repeats in hu- As already described for the SCA1 gene, this might man testis cDNA analyzed in this study (strong hy- contribute to the stabilization of the repeat length bridization signal) was ∼13, with at least 30% of the (Chung et al. 1993). For the remaining cluster, the 39 clusters above this value. The different lengths sequence was not accurate enough to determine that we observed are within the same range of re- whether the insertion induces a frameshift in the peat numbers usually observed in normal of ORF. disease genes (5–54 trinucleotide repeats). Other re- Until now genes with CAG expansions that are ports, analyzing either genomic DNA or cDNA, likely involved in genetic diseases have two specific

GENOME RESEARCH 709 Downloaded from genome.cshlp.org on October 4, 2021 - Published by Cold Spring Harbor Laboratory Press

BULLE ET AL.

Table 2. Localization of CAG-Containing cDNA from Human Testis

(chr) The chromosomal assignment; (C) the cluster number; (name) the name of the clone with the last digit indicating the 21M13; r: M13 reverse); (sub-chromosomal localization) the regionalמ :primer used for the sequence (s: SP6; t: T7; m localization of the sequence with the flanking framework markers defined by their D number; (log odds) the decimal logarithm of the likelihood ratio showing how many times it is more probable that the EST is in the given interval than elsewhere (*: localization to be taken with precaution, †: the second D number is not contiguous to the first D number indicated); (RHdb) the RH identifier of the localized EST in the Radiation Hybrid database; (description) the name of the homologous sequences if known.

features: (1) the stretch of CAG is translated into CAGs for gene 25. This could reflect allelic mosai- polyglutamine; and (2) in general, this locus is sism expression observed previously for this kind of highly polymorphic. With partial sequencing of gene in human testis (Zu¨hlke et al 1993; Telenius et cDNA, it is impossible to predict with accuracy the al. 1995; Zhang et al. 1995), which may occur in ORF in which the CAG repetition is inserted. There- during spermatogenesis. fore, such stretches could be translated as poly(Gln), From the 39 clusters that we identified, 19 cor- poly(Ser), or poly(Ala). As an example, in our clones respond to unknown genes and 20 to already char- there is one that corresponds to the nucleolar phos- acterized genes. Among this population, we re- phoprotein p130 and contains a CAG repeat coding trieved two of the seven genes already described to for a poly(Ser), which is not known to be involved be involved in CAG diseases, the MJD1 protein and in a neurodegenerative disease. Nevertheless, one the myotonic dystrophy kinase, which are the prod- cannot exclude implication of a CAG amplification ucts of genes involved in Machado-Joseph disease in a poly(Ser) or poly(Ala) in genetic diseases. For and myotonic dystrophy, respectively. This is the example, an expansion of a polyalanine stretch in first observation that indicates an active transcrip- the amino-terminal region of HOXD 13 is associ- tion of these genes in human testis. ated with the (Muragaki et al. 1996). The cloning of cDNAs containing CAG repeats For two genes, we detected some polymorphisms in by hybridization is an efficient method for the de- the length of the CAG/CTG repeat. We observed a tection of new candidate genes, but this technique variation of 13 to 17 CAGs for gene 14, and 10 or 11 is very time-consuming and it is difficult to elimi-

710 GENOME RESEARCH Downloaded from genome.cshlp.org on October 4, 2021 - Published by Cold Spring Harbor Laboratory Press

Table 3. Summary and Localization of CAG Sequences from IMAGE Source

(chr) The chromosomal assignment; (C) the cluster number; (accession of ESTs-CAG) the accession numbers of CAG-positive clones; (sub-chromosomal localization) the regional localization of the sequence with the flanking framework markers defined by their D number; (log odds) the decimal logarithm of the likelihood ratio showing how many times it is more probable that the EST is in the given interval than elsewhere (*: localization to be taken with precaution, †: the second D number is not contiguous to the first D number indicated); (accession) the accession number in dbEST of the EST sequence used for the chromosomal localization; (RHdb) the RH identifier of the localized EST in the Radiation Hybrid database; (description) the name of the homologous sequences, when known.

GENOME RESEARCH 711 Downloaded from genome.cshlp.org on October 4, 2021 - Published by Cold Spring Harbor Laboratory Press

BULLE ET AL. nate redundancy. As a complementary approach, into XlI blue, gave 360,000 independent colonies. Ten copies 6 clones) were plated onto filters and 10 ן we screened databases for ESTs containing CAG. of this library (4 hybridized with a 5Ј-32 The IMAGE consortium identified by hybridization P-labeled oligonucleotide (CAG)14 in SSC: 150 mM NaCl/15 mM sodium citrate (pH ןSSC [1 ן6 Denhardt’s solution: 0.2% ןDenhardt’s solution (1 ןwith a CAG specific probe a large series of tran- 7.0)]; 5 scripts that are likely to contain such repeats. The bovine serum albumin, 0.02% Ficoll, 0.02% polyvinylpyrrol- corresponding cDNAs, once characterized by 3Ј par- idone); 0.1% SDS; 5 mM EDTA (pH 7.5); and 100 µg/ml of tial sequencing, did not reveal long stretches of denatured salmon sperm DNA for 16 at 42°C. After hybrid- ן CAG, but larger repeats can be present upstream in ization, filters were washed twice in 0.5 SSC with 0.1% SDS -80°C to Amersham Hyperמ at 65°C for 1 hr and exposed at the transcripts. This approach, although less reli- film for 16 hr with one intensifying screen. able, allows us to screen rapidly a larger pool of cDNA from various tissues. Until now, the various studies that have been DNA Sequencing done to identify new genes containing CAG repeti- Plasmid minipreps were performed using a minikit Tip 20 ∼ tions have reported 100 genes (Riggins et al. 1992; (Qiagen, Chatsworth, CA) according to manufacturer’s speci- Li et al. 1993; Jiang et al. 1995; Aoki et al. 1996; Ne´ri fications. Plasmid DNA concentrations were adjusted to 250 et al. 1996). Of these genes, only 17 have been as- ng/µl based on absorbance at 260 nM. Plasmids were se- signed to chromosomes and 7 of them sublocalized. quenced according to Sanger’s method using fluorescent dye- labeled primers and cycle sequencing kits (Applied Biosys- In the present study, we have localized the largest tems) as described previously (Pawlak et al. 1995). The reac- group of genes containing CAG repeats. We have tion products were analyzed on a 373A automated DNA mapped 27 testis cDNAs and 68 EST sequences con- sequencer (Applied Biosystems). The sequences were done taining CAG/CTG repeats using a radiation hybrid systematically on the 3Ј end of the cDNA using SP6 or 21M13 primer and, when necessary, on the 5Ј end using T7מ panel. All the present localizations agree with the or M13 reverse primer. previous assignments when available. In our study, CAG-containing genes were found on all chromosomes, except for chromosome X (the Sequence Analysis chromosome Y was not included in the panel of genome radiation hybrid). The distribution of the The sequences were edited manually and limited to 400 bp and 2% ambiguities (N). The redundancy was evaluated by CAG-containing genes is not even, the largest num- internal comparison of those sequences using the FASTA pro- ber (10) being present on chromosome 19, whereas gram. The sequences were sent to the National Center for only one gene was detected on chromosome 2. Biotechnology Information (NCBI) for BLASTX and BLASTN Some of the localizations of anonymous se- analysis (Altschul et al. 1990) in the nonredundant nucleic quences that we have found are very close to loci acid and protein libraries. Sequence similarities identified by the BLAST programs were considered statistically significant involved in autosomal dominant genetic diseases when scores were >150 and 75 for acid nucleic acid and amino associated with progressive neuropathy such as acid sequences, respectively, or when the Poisson P value was Charcot-Marie-Tooth disease type B (3q13–q22), <0.05. The BLASTX and BLASTN results for each clone were schizophrenia disorder 1 (5q11.2–q13.3), Charcot- analyzed simultaneously and processed manually. We always Marie-Tooth neuropathy (8q13–q21.1), related 4 selected the protein match when a hit was detected with both types of analyses. and 8 of spinocerebellar ataxia (16q, 10q23.1–24.1), or schizophrenia disorder 4 (22q11). Further studies are needed to investigate whether some of these PCR for Radiation Hybrid Mapping genes might be related to these genetic diseases. Primers for the PCRs were designed using the program as de- scribed by Rychlick and Rhoads (1989), which was adapted to large-scale primer design. The repeat elements, such as Alu, METHODS Kpn, and LINE were masked first and then the primers were selected according to the desired criteria. PCRs were per- cDNA Cloning formed on DNA obtained from the Genebridge 4 radiation hybrid panel (Gyapay et al. 1996). The PCRs were carried out Poly(A)+ RNA from the testis of a 27-year-old man was isolated in a volume of 15 µl. The final concentrations in the PCR were as described previously (Matsuoka et al. 1992). The cDNA li- as follow: 2 ng/µl of DNA, 125 nM dNTP (31 nM of each), 1.33 brary was constructed using the Superscript plasmid system µM primers (of each), 50 mM KCl, 2 mM MgCl2, 0.1% of Triton and plasmid cloning kit (Life Technologies, Inc.) according to X-100, 0.01% of gelatine, 10 mM of Tris-HCl (pH 9.0) (25°C), manufacturer’s specifications, except that reverse transcrip- and 0.25 units per 15 µl of Taq polymerase. The samples were tion was done at 42°C for 1 hr. The cDNA was size-selected on overlaid with heavy mineral oil. Amplifications were per- S400 Sephacryl columns, and the material >700 bp was in- formed using the hot start procedure. The first three cycles serted in an oriented manner into pSPORT1 vector, using No- consisted of 30 sec of annealing at 61°C and 40 sec of dena- tI–SalI adaptators. The resulting plasmids, once transfected turing at 94°C. The annealing temperature was lowered suc-

712 GENOME RESEARCH Downloaded from genome.cshlp.org on October 4, 2021 - Published by Cold Spring Harbor Laboratory Press

MAPPING OF HUMAN GENES CONTAINING CAG/CTG REPEATS cessively by 2°C for each consecutive three cycles until 55°C, Bowen, T., C. Guy, G. Speight, L. Jones, A. Cardno, K. followed by 25 further cycles at an annealing temperature of Murphy, P. McGuffin, M.J. Owen, and M.C. O’Donovan. 55°C. After completion of the PCR reaction, 4 µl of loading 1996. Expansion of 50 CAG/CTG repeats excluded in mixture containing 0.1% (wt/vol) bromophenol blue and Schizophrenia by application of a highly efficient approach 50% (vol/vol) glycerol were added to each well. The PCR prod- using repeat expansion detection and a PCR screening set. ucts were allowed to migrate on an agarose gel containing 1% Am. J. Hum. Genet. 59: 912–917. SeaKem and 3% NuSieve agarose in TBE buffer with 0.25 µg/ ml ethidium bromide. Then, the images of the gels were re- Brook, J.D., M.E. McCurrach, H.G. Harley, A.J. Buckler, D. corded with a CCD camera and scoring of the results was Church, H. Aburatani, K. Hunter, V.P. Stanton, J.P. Thirion, carried out semiautomatically with the BioImage software de- T. Hudson et al. 1992. Molecular basis of myotonic veloped by Millipore. Typing results were downloaded into a dystrophy: Expansion of a trinucleotide (CTG) repeat at the database. The calculations were performed using the RHMAP 3Ј end of a transcript encoding a protein kinase family package (Boehnke et al. 1991). Positioning of the CAG/CTG member. Cell 68: 799–808. containing clones or ESTs were carried out relative to ∼1000 evenly distributed Genethon genetic markers. In the course of Brown, J., A. Ashworth, S. Gydesen, A. Sorensen, M. Rossor, the calculations, the program positioned the ESTs into each J. Hardy, and J. Collinge. 1995. Familial non-specific interval defined by the adjacent genetic markers and the dementia maps to . Hum. Mol. Genet. probability of this position was calculated. The highest prob- 4: 1625–1628. ability was retained and considered as the real position of the given locus. Campuzano, V., L. Montermini, M.D. Molto, L. Pianese, M. Cosse´e, F. Cavalcanti, E. Monros, F. Rodius, F. Duclos, A. Monticelli et al. 1996. Friedrich’s ataxia: Autosomal ACKNOWLEDGMENTS recessive disease caused by an intronic GAA triplet repeat expansion. Science 271: 1423–1427. We thank Y. Laperche and T. Rohn for the critical reading of the manuscript, R. Derreumaux for excellent technical assis- Chung, M.Y., L.P.W. Ranum, L.A. Duvick, A. Servadio, H.Y. tance, and Edith Grandvilliers for her secretarial assistance. Zoghbi, and H.T. Orr. 1993. Evidence for a mechanism This work was funded by INSERM and the Groupement de predisposing to intergenerational CAG repeat instability in Recherche sur l’Etude des Ge´nomes. spinocerebellar ataxia type 1. Nature Genet. 3: 254–258. The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be Cox, D.R., M. Burmeister, E.R. Price, S. Kim, and R.M. hereby marked ‘‘advertisement’’ in accordance with 18 USC Myers. 1990. Radiation hybrid mapping: A somatic cell section 1734 solely to indicate this fact. genetic method for constructing high-resolution maps of mammalian chromosomes. Science 250: 245–250.

REFERENCES Durr, A., G. Stevanin, G. Cancel, C. Duyckaerts, N. Abbas, Altschul, S.F., W. Gish, W. Miller, E.W. Myers, and D.J. O. Didierjean, H. Chneiweiss, A. Benomar, O. Lynn-Caen, J. Lipman. 1990. Basic local alignment search tool. J. Mol. Biol. Julien et al. 1996. Spinocerebellar ataxia 3 and 215: 403–410. Machado-Joseph disease: Clinical, molecular, and neuropathological features. Ann. Neurol. 39: 490–499. Aoki, M., L. Koranyi, A.C. Riggs, J. Wasson, K.C. Chiu, M. Vaxillaire, P. Froguel, S. Gough, L. Liu, H. Donis-Keller et al. Duyao, M., C. Ambrose, R. Myers, A. Novelletto, F. 1996. Identification of trinucleotide repeat-containing genes Persichetti, M. Frontali, S. Folstein, C. Ross, M. Franz, M. in human pancreatic islets. Diabetes 45: 157–164. Abbott et al. 1993. Trinucleotide repeat length instability and age of onset in Huntington’s disease. Nat. Genet. Aslanidis, C., G. Jansen, C. Amemiya, G. Shutler, M. 4: 387–392. Mahadevan, C. Tsilfidis, C. Chen, J. Alleman, N.G. Wormskamp, M. Vooijs et al. 1992. Cloning of the essential Engstrom, C., A.S. Thornlund, A.L. Johansson, M. myotonic dystrophy region and mapping of the putative Langstrom, J. Chotai, R. Adolfsson, and P.O. Nylander. defect. Nature 355: 548–551. 1995. Anticipation in unipolar affective disorder. J. Affect. Disord. 35: 31–40. Bassett, A.S. and W.G. Honer. 1994. Evidence for anticipation in schizophrenia. Am. J. Hum. Genet. Fu, Y.H., A. Pizzuti, R.G. Fenwock, J. King, S. Raynarayan, 54: 864–870. P.W. Dunne, J. Dubel, G.A. Nasser, T. Ashizawa, P. de Jong et al. 1992. An unstable triplet repeat in a gene related to Benomar, A., L. Krols, G. Stevanin, G. Cancel, E. LeGuern, myotonic . Science 255: 1256–1258. G. David, H. Ouhabi, J.J. Martin, A. Durr, A. Zaim et al. 1995. The gene for autosomal dominant cerebellar ataxia Gastier, J.M., T. Brody, J.C. Pulido, T. Businga, S. Sunden, X. with pigmentary macular dystrophy maps to chromosome Hu, S. Maitra, K.H. Buetow, J.C. Murray, V.C. Sheffield et al.

3p12-p21.1. Nature Genet. 10: 84–88. 1996. Development of a screening set for new (CAG/CTG)n dynamic mutations. Genomics 32: 75–85. Boehnke M., K. Lange, and D.R. Cox. 1991. Statistical methods for multipoint radiation hybrid mapping. Am. J. Gyapay, G., K. Schmitt, C. Fizames, H. Jones, N. Hum. Genet. 49: 1174–1188. Vega-Czarny, D. Spillett, D. Muselet, J.D. Prud’Homme, C.

GENOME RESEARCH 713 Downloaded from genome.cshlp.org on October 4, 2021 - Published by Cold Spring Harbor Laboratory Press

BULLE ET AL.

Dib, C. Auffray et al. 1996. A radiation hybrid map of the Li, S.H., M.G. McInnis, R.L. Margolis, S.E. Antonarakis, and human genome. Hum. Mol. Genet. 5: 339–346. C.A. Ross. 1993. Novel triplet repeat containing genes in human brain: Cloning, expression, and length Hawkins, J.D. 1988. A survey on intron and exon lengths. polymorphisms. Genomics 16: 572–579. Nucleic Acids Res. 16: 9893–9908. Matsuoka, I., G. Giuili, M. Poyard, D. Stengel, J. Parma, G. Hayes, P.D., K. Schmitt, H.B. Jones, G. Gyapay, J. Guellaen, and J. Hanoune. 1992. Localization of adenylyl Weissenbach, and P.N. Goodfellow. 1996. Regional and guanylyl cyclase in rat brain by in situ hybridization: assignment of human ESTs by whole-genome radiation Comparison with calmodulin mRNA distribution. J. hybrid mapping. Mamm. Genome 7: 446–450. Neurosci. 12: 3350–3360.

The Huntington’s Disease Collaborative Research Group. McInnis M.G., F.J. McMahon, G.A. Chase, S.G. Simpson, 1993. A novel gene containing a trinucleotide repeat that is C.A. Ross, and J.R. DePaulo. 1993. Anticipation in bipolar expanded and unstable on Huntington’s disease affective disorder. Am. J. Hum. Genet. 53: 385–390. chromosomes. Cell 72: 971–983. Morris, A.G., E. Gaitonde, P.J. McKenna, J.D. Mollon, and Imbert, G., F. Saudou, G. Yvert, D. Devis, Y. Trottier, J.-M. D.M. Hunt. 1995. CAG repeat expansions and Garnier, C. Weber, J.L. Mandel, G. Cancel, N. Abbas et al. schizophrenia: Association with disease in females and with 1996. Cloning of the gene for spinocerebellar ataxia 2 early age-at-onset. Hum. Mol. Genet. 4: 1957–1961. reveals a locus with high sensitivity to expanded CAG/glutamine repeats. Nature Genet. 14: 285–291. Muragaki, Y., S. Mundlos, J. Upton, and B.R. Olsen. 1996. Altered growth and branching patterns in Synpolydactyly Jiang, J.-X., R.H. Lekanne Deprez, E.C. Zwarthoff, and P.J. caused by mutations in HOXD13. Science 272: 548–551. Riegman. 1995. Characterization of four novel CAG repeat-containing cDNAs. Genomics 30: 91–93. Nagafuchi, S., H. Yanagisawa, K. Sato, T. Shirayama, E. Ohsaki, M. Bundo, T. Takeda, K. Tadokoro, I. Kondo, N. Jones, C., P. Slijepcevic, S. Marsh, E. Baker, W.Y. Langdow, Murayama et al. 1994. Dentatorubral pallidoluysian atrophy R.I. Richards, and A. Tunnacliffe. 1994. Physical linkage of expansion of an unstable CAG trinucleotide on the fragile site FRA11B and a Jacobsen syndrome chromosome 12p. Nature Genet. 1: 14–18. chromosome deletion break point in 11q23.3. Hum. Mol. Genet. 3: 2123–2130. Nancarrow, J.K., E. Kremer, K. Holman, H. Eyre, N.A. Dogett, D. Paslier, D.F. Callen, G.R. Sutherland, and R.I. Kawagushi, Y., T. Okamoto, M. Taniwaki, M. Aizawa, M. Richards. 1994. Implications of FRA16A structure for the Inoue, S. Katayama, H. Kawakami, S. Nakamura, M. mechanism of chromosomal fragile sites genesis. Science Nishimura, I. Akiguchi et al. 1994. CAG expansions in a 264: 1938–1944. novel gene for Macado-Joseph disease at chromosome 14q32.1. Nature Genet. 8: 221–228. Ne´ri, C., V. Albanee`se, A.-S. Le`bre, S. Holbert, C. Saada, L. Bougueleret, S. Meier-Ewert, I. LeGall, P. Millasseau, H. Bui Knight, S., A.V. Flannery, M.C. Hirst, L. Campbell, Z. et al. 1996. Survey of CAG/CTG repeats in human cDNAs Christodoulou, S.R. Phelps, J. Pointon, H.R. representing new genes: Candidates for inherited Middleton-Price, A. Barnicoat, M.E. Pembrey et al. 1993. neurological disorders. Hum. Mol. Genet. 5: 1001–1009. Trinucleotide repeat amplification and hypermethylation of a CpG island in FRAXE mental retardation. Cell O’Donovan, M.C., C. Gui, N. Craddock, K.C. Murphy, A.G. 74: 127–134. Cardno, L.A. Jones, M.J. Owen, and P. McGuffin. 1995. Expanded CAG repeats in schizophrenia and bipolar Koide, R., T. Ikeuchi, O. Onodera, H. Tanaka, S. Igarashi, K. disorder. Nature Genet. 10: 380–381. Endo, H. Takahashi, R. Kondo, A. Ishikawa, T. Hayashi et al. 1994. Unstable expansion of CAG repeat in hereditary Ohshima, K., S. Kang, and R.D. Wells. 1996. CTG triplet dentatorubral-pallidoluysian atrophy (DRPLA). Nature Genet. repeats from human hereditary diseases are dominant 1: 9–13. genetic expansion products in Escherichia coli. J. Biol. Chem. 271: 1853–1856. Kremer, E.J., S. Yu, M. Pritchard, R. Nagaraja, D. Heitz, M. Lynch, E. Baker, V.J. Hyland, R.D. Little, M. Wada et al. Orr, H.T., M. Chung, S. Banfi, T.J. Kwiatkowski, A. Serviado, 1991. Isolation of a human DNA sequence which spans the A.L. Beaudet, A.E. McCall, L.A. Duvick, L.P. Ranum, and fragile X. Am. J. Hum. Genet. 49: 656–661. H.Y. Zoghbi. 1993. Expansion of an unstable trinucleotide (CAG) repeat in spinocerebellar ataxia type 1. Nature Genet. La Spada, A.R., E.M. Wilson, D.B. Lubahn, A.E. Harding, 4: 221–226. and K.H. Fischbeck. 1991. Androgen receptor gene mutations in X-linked spinal and bulbar muscular atrophy. Pawlak, A., C. Toussaint, I. Levy, F. Bulle, M. Poyard, R. Nature 352: 77–79. Barouki, and G. Guellaen. 1995. Characterization of a large population of mRNAs from human testis. Genomics La Spada, A.R., H.L. Paulson, and K.H. Fishbeck. 1994. 26: 151–158. Trinucleotide repeat expansion in neurological disease. Ann. Neurol. 36: 814–822. Parrish, J.E., B.A. Oostra, A.J.M.H. Verkerk, C.S. Richards, J.

714 GENOME RESEARCH Downloaded from genome.cshlp.org on October 4, 2021 - Published by Cold Spring Harbor Laboratory Press

MAPPING OF HUMAN GENES CONTAINING CAG/CTG REPEATS

Reynolds, I.S. Spikes, L.G. Shaffer, and D.L. Nelson. 1994. 1993. A novel gene containing a trinucleotide repeat that is Isolation of a GCC repeat showing expansion in FRAXF, a expanded and unstable on Huntington’s disease fragile site distal to FRAXA and FRAXE. Nature Genet. chromosomes. Cell 72: 971–983. 8: 229–235. Verkerk, A.J.M.H., M. Pierretti, J.S. Sutcliffe, Y.H. Fu, D.P. Persson, H., M. Pelto-Huikko, and M. Metsis. 1990. Kuhl, and A. Pizzuti. 1991. Identification of a gene (FMR-1) Expression of neurotransmitter synthesizing enzyme containing a CGG repeat coincident with a breakpoint glutamic acid decarboxylase in male germ cells. Mol. Cell. cluster region exhibiting length variation in fragile X Biol. 10: 4701–4711. synchrome. Cell 65: 905–914.

Perutz, M.F., T. Johnson, M. Suzuki, and J.T. Finch. 1994. Wolgemuth, D. and F. Watrin. 1991. List of cloned mouse Glutamine repeats as polar zippers: Their possible role in genes with unique expression patterns during inherited neurodegenerative diseases. Proc. Natl. Acad. Sci.. spermatogenesis. Mamm. Genome 1: 283–288. 91: 5355–5358. Yu, S., M. Pritchard, E. Kremer, M. Lynch, J. Nancarrow, E. Petronis, A., A.S. Bassett, W.G. Honer, J.B. Vincent, Y. Baker, K. Holman, J.C. Mulley, S.T. Warren, D. Schlessinger Tatuch, T. Sasaki, D.J. Ying, T.A. Klempar, and J.L. Kennedy. et al. 1991. Fragile X genotype characterized by an unstable 1996. Search for unstable DNA in Schizophrenia families region of DNA. Science 252: 1179–1181. with evidence for genetic anticipation. Am. J. Hum. Genet. 59: 905–911. Zhang, L., K.H. Fischbeck, and N. Arnheim. 1995. CAG repeat length variation in sperm from a patient with Pulst, S.-M., A. Nechiporuk, T. Nechiporuk, S. Gispert, X-N. Kennedy’s disease. Hum. Mol. Genet. 4: 303–305. Chen, I. Lopes-Cendes, S. Pearlman, S. Starkman, G. Orozco-Diaz, A. Lunkes et al. 1996. Moderate expansion of Zhuchenko, O., J. Bailey, P. Bonnen, T. Ashizawa, D.W. a normally biallelic trinucleotide repeat in spinocerebellar Stockton, C. Amos, W.B. Dobyns, S.H. Subramony, H.Y. ataxia type 2. Nature Genet. 14: 269–276. Zoghbi, and C.C. Lee. 1997. Autosomal dominant cerebellar ataxia (SCA6) associated with small polyglutamine Riggins, G.J., L.K. Lokey, J.L. Chastain, H.A. Leiner, S.L. expansions in the alpha 1A-voltage-dependent calcium Sherman, K.D. Wilkinson, and S.T. Warren. 1992. Human channel. Nature Genet. 15: 62–69. genes containing polymorphic trinucleotide repeats. Nature Genet. 2: 186–191. Zu¨hlke, C., O. Riess, B. Bockel, H. Lange, and U. Thies. 1993. Mitotic stability and meiotic variability of the (CAG)n Ross, C.A., M.G. McInnis, R.L. Margolis, and S.H. Li. 1993. repeat in the Huntington disease gene. Hum. Mol. Genet. Genes with triplet repeats: Candidate mediators of 2: 2063–2067. neuropsychiatric disorders. Trends Neurosci. 16: 254–260.

Rychlik,W. and R.E. Rhoads 1989. A computer program for Received February 13, 1997; accepted in revised form May 1, choosing optimal oligonucleotides for filter hybridization, 1997. sequencing and in vitro amplification of DNA. Nucleic Acids Res. 17: 8543–8551.

Sanpei, K., H. Takano, S. Igarashi, T. Sato, M. Oyake, H. Sasaki, A. Wakisaka, K. Tashiro, Y. Ishida, T. Ikeuchi et al. 1996. Identification of the spinocerebellar ataxia type 2 gene using a direct identification of repeat expansion and cloning technique, DIRECT. Nature Genet. 14: 277–284.

Sasaki, T., E. Billett, A. Petronis, D. Ying, T. Parsons, F.M. Macciardi, H.Y. Meltzer, J. Lieberman, R.T. Joffe, C.A. Ross et al. 1996. Psychosis and genes with trinucleotide repeat polymorphism. Hum. Genet. 97: 244–246.

Schuler, G.D., M.S. Boguski, E.A. Stewart, L.D. Stein, G. Gayapay, K. Rice, R.E. White, P. Rodriguez-Tome, A. Aggarwal, E. Bajork et al. 1996. A gene map of the human genome. Science 274: 540–546.

Telenius, H., E. Almqvist, B. Kremer, N. Spence, F. Squitieri, K. Nichol, U. Grandell, E. Starr, C. Benjamin, I. Castaldo et al. 1995. Somatic mosaicism in sperm is associated with intergenerational (CAG)n changes in Huntington disease. Hum. Mol. Genet. 4: 189–195.

The Huntington’s Disease Collaborative Research Group.

GENOME RESEARCH 715 Downloaded from genome.cshlp.org on October 4, 2021 - Published by Cold Spring Harbor Laboratory Press

Identification and Chromosomal Localization of Human Genes Containing CAG/CTG Repeats Expressed in Testis and Brain

Frédérique Bulle, Nuchanard Chiannilkulchai, André Pawlak, et al.

Genome Res. 1997 7: 705-715 Access the most recent version at doi:10.1101/gr.7.7.705

References This article cites 58 articles, 12 of which can be accessed free at: http://genome.cshlp.org/content/7/7/705.full.html#ref-list-1

License

Email Alerting Receive free email alerts when new articles cite this article - sign up in the box at the Service top right corner of the article or click here.

To subscribe to Genome Research go to: https://genome.cshlp.org/subscriptions

Cold Spring Harbor Laboratory Press