<<

J Mol Evol (2001) 52:232–238 DOI: 10.1007/s002390010151

© Springer-Verlag New York Inc. 2001

Phylogenetic Analysis of the Friedreich Ataxia GAA Trinucleotide Repeat

Cristina M. Justice,1,* Zhining Den,1 Son V. Nguyen,2 Mark Stoneking,6 Prescott L. Deininger,7 Mark A. Batzer,1–5 Bronya J.B. Keats1,4,5

1 Department of , Louisiana State University Health Sciences Center, New Orleans, LA 70112, USA 2 Department of Pathology, Louisiana State University Health Sciences Center, New Orleans, LA 70112, USA 3 Department of Biochemistry and Molecular Biology, Louisiana State University Health Sciences Center, New Orleans, LA 70112, USA 4 Stanley S. Scott Cancer Center, Louisiana State University Health Sciences Center, New Orleans, LA 70112, USA 5 Neuroscience Center of Excellence, Louisiana State University Health Sciences Center, New Orleans, LA 70112, USA 6 Max Planck Institute for Evolutionary Anthropology, Inselstrasse 22, D-04103 Leipzig, Germany 7 Tulane Cancer Center, Department of Environmental Health Sciences, Tulane University Health Sciences Center, New Orleans, LA 70112, USA

Received: 18 August 2000 / Accepted: 10 November 2000

Abstract. Friedreich ataxia is an autosomal recessive belonging to the Pongidae family, the chimpan- neurodegenerative disorder associated with a GAA re- zees were found to carry three or four GAA repeats, the peat expansion in the first intron of the gene (FRDA) orangutans had four or five GAA repeats, and the gorilla encoding a novel, highly conserved, 210 amino acid pro- carried three GAA repeats. In primates belonging to the tein known as frataxin. Normal variation in repeat size Cercopithecidae family, three GAA repeats were found was determined by analysis of more than 600 DNA in the mangabey and two in the rhesus macaque. How- samples from seven populations. This analysis ever, an AluY subfamily member inserted in the poly(A) showed that the most frequent had nine GAA re- tract preceding the GAA repeat region in the rhesus ma- peats, and no with fewer than five GAA repeats caque, making the amplified sequence approximately were found. The European and Syrian populations had 300 bp longer. The GAA repeat was also found in the the highest percentage of alleles with 10 or more GAA tamarin, suggesting that it arose at least 40 million years repeats, while the Papua New Guinea population did not ago and remained relatively small throughout the major- have any alleles carrying more than 10 GAA repeats. The ity of , with a punctuated expansion in distributions of repeat sizes in the European, Syrian, and the human . African American populations were significantly differ- ent from those in the Asian and Papua New Guinea Key words: Friedreich ataxia — GAA trinucleotide populations (p < 0.001). The GAA repeat size was also repeat — Phylogenetic analysis — — Nonhu- determined in five nonhuman primates. Samples from 10 man primate — FRDA gene chimpanzees, 3 orangutans, 1 gorilla, 1 rhesus macaque, 1 mangabey, and 1 tamarin were analyzed. Among those Introduction

* Present address: Genometrics Section, National Re- Friedreich ataxia is an autosomal recessive disorder in search Institute, NIH, Baltimore, MD 21224, USA which there is marked degeneration of the central and Correspondence to: Dr. Bronya J.B. Keats, Department of Genetics, Louisiana State University Health Sciences Center, 533 Bolivar Street, peripheral nervous systems. It is associated with an un- New Orleans, LA 70112, USA; Fax: 504-568-8500; e-mail: stable GAA trinucleotide repeat within the first intron of [email protected] the gene (FRDA) encoding a highly conserved, 210- 233 amino acid protein of unknown function called frataxin GAA Repeat Analysis of Human Populations (Campuzano et al. 1996). Cell lines and tissues from patients show reduced levels of the mature transcript as Identification of the GAA repeat was done by PCR analysis using well as the protein, which has been localized to the mi- primer 629-R (Filla et al. 1996) radiolabeled with [␥-32P]ATP, and Ј Ј tochondrial membrane (Babcock et al. 1997; Campuzano primer GAA1-F (5 -CGGAGTTCAAGACTAACCTGGCC-3 ). The 15-␮l PCR reaction mixture consisted of 80 ng of genomic DNA, 100 et al. 1997; Koutnikova et al. 1997; Priller et al. 1997). ng of GAA1-F, 6 ng of 629-R, a 200 ␮M concentration of dNTPs The GAA trinucleotide repeat is found in the middle of (dGTP, dATP, TTP, dCTP), 1× magnesium-free buffer (Promega), 6 an Alu sequence and is flanked by a 13-bp mM MgCl2,and1UofTaq polymerase. Cycling conditions were (AAAATGGATTTCC) (Montermini et al. 1997a). denaturation for 1 min at 94°C, followed by 15 cycles at 94°C for The majority of patients with Friedreich ataxia are 20 s and 68°C for 90 s, with an additional 15 cycles in which the length of the 68°C step was increased by 10 s per cycle, plus a final extension homozygous for the GAA trinucleotide repeat expansion for 10 min at 72°C in a Perkin–Elmer GeneAmp Thermal Cycler 9600. (Campuzano et al. 1996; Du¨rr et al. 1996; Filla et al. The PCR products were electrophoresed on a 6% denaturing polyacryl- 1996; Montermini et al. 1997b). Compound heterozy- amide gel for 4 h. The PCR products were compared to a 100-bp ladder gotes with an expansion in one allele and a point muta- radiolabeled with [␥-32P]ATP and run next to reference alleles, whose tion in the other allele account for less than 5% of pa- GAA repeat length was determined through sequencing. The repeat length was obtained by subtracting 324 bp from the fragment size and tients (Bidichandani et al. 1997; Botti et al. 1997; dividing this by 3. The accuracy of the sizing of the repeat was esti- Campuzano et al. 1996; Cosse´e et al. 1997; Dürr et al. mated as plus or minus one GAA repeat due to the variable poly(A) 1996; Filla et al. 1996; Forrest et al. 1997; Labuda et al. tract preceding the GAA repeat. 1997; Monro´s et al. 1997). To detect any expanded alleles, amplifications were conducted with An initial study of a population of individuals of Eu- primers GAA-629R and GAA-104F (Filla et al. 1996) using the Ex- pand High Fidelity PCR System (Boehringer Mannheim). The 15-␮l ropean descent reported that normal-sized alleles con- PCR reaction mixture consisted of 400 ng of DNA, a 300 nM concen- tained from 8 to 22 GAA repeats (Campuzano et al. tration of each primer, a 200 ␮M concentration of dNTPs, 10× con-

1996), and Epplen et al. (1997) extended the range to centration of buffer, 1.2 mM MgCl2,and2UofDNApolymerase mix. between 7 and 29 GAA repeats in 178 healthy German Cycling conditions were denaturation for 1 min at 94°C, 17 cycles at individuals. Montermini et al. (1997a) found that 83% of 94°C for 20 s and 68°C for 150 s, followed by 15 cycles in which the length of the 68°C step was increased by 20 s per cycle, plus a final normal chromosomes had between 6 and 10 GAA re- extension for 10 min at 68°C. To determine the presence or absence of peats, with half of those having 9 GAA repeats, but they an expanded allele, a Southern blot was performed using a (GAA)7 also found alleles with more than 30 repeats. Similarly, probe end labeled with [␥-32P]ATP. The sizing of several normal al- Cosse´e et al. (1997) examined a French sample and leles was checked through direct sequencing using Thermo Sequenase found that approximately 50% of normal-sized alleles (Amersham), with 629-R as an internal primer. The PCR fragments were separated on a 1% agarose gel and purified with QIAEX II ex- had nine GAA repeats. traction kit (Qiagen) prior to sequencing. In this study we have analyzed DNA samples from seven human populations to determine the amount of GAA Repeat Analysis in Nonhuman Primates genetic variation in the size of the GAA repeat within and among human population groups. In addition, we PCR amplifications in primates were performed using primers included several nonhuman primates to understand more GAA-629R and GAA-104F (Filla et al. 1996). If no product was about the evolution of the GAA trinucleotide repeat. obtained, primers Bam and 2500F (Campuzano et al. 1996) were used. The primers used to amplify the tamarin DNA were FRDA-F (5Ј-ATTTGGCCCACATTGTGTTT-3Ј) and FRDA-R (5Ј- Materials and Methods CCACACTTGCCTATTTTTCCA-3Ј). In humans, these primers pro- duce a 480-bp fragment. Direct sequencing of all primate samples was This study comprised genomic DNA samples from 661 unrelated and performed to determine the number of GAA repeats. unaffected individuals belonging to seven population groups: 222 Eu- ropeans, 105 African Americans, 69 Syrians, 67 Asians, 69 African Statistical Analyses Bantu speakers, 31 African Khoisan speakers, and 98 Papua New Guin- eans. The European population was composed of 125 individuals of The GAA repeat data from the seven populations were assigned to two northern European descent residing in the United States and 97 from groups according to whether they had 9 or fewer GAA repeats or 10 or Switzerland. The Bantu speakers were primarily from South Africa, more GAA repeats, allowing fora7×2table with no empty cells. A while most Khoisan speakers originated from Botswana. The Asian ␹2 test, using the SAS computer program, was performed to test for population was composed of 28 Indonesians and 39 Thai, and the significant differences in GAA repeat distribution among the popula- Papua New Guinea group included 61 individuals from the Highlands tion groups. Because 21 comparisons were performed, significance was and 37 individuals from the Coastal regions. Additionally, we analyzed considered only if the p value was less than 0.002, while values up to genomic DNA from 10 common chimpanzees (Pan troglodytes), 3 0.005 were considered indicative of a significant trend. Heterozygosity orangutans (Pongo pygmaeus), 1 gorilla (Gorilla gorilla), 1 rhesus values were calculated using the program PIC from the LINKAGE macaque (Macaca mulatta), 1 sooty mangabey (Cercocebus torquatus UTILITY package described by Ott (1991). atys), and 1 tamarin (Saguinus oedipus). Six of the chimpanzee samples were from unrelated or distantly related individuals from the Southwest Foundation for Biomedical Research, located in San Antonio, Texas. Results The rhesus macaque and mangabey samples were from the Delta Re- gional Primate Center, Covington, Louisiana. The rest of the primate We found that in our human population samples the samples were obtained from BIOS. GAA trinucleotide repeat size varied from 5 to 50 GAA 234

Fig. 1. Size distribution of GAA repeat alleles in humans. The populations are shown along with the sample size (number of alleles), in parentheses. The heterozygosity values (H) are also shown. repeats (Fig. 1). The most common allele in all the popu- not significantly different from each other (Table 1). lations had nine GAA repeats. The percentage of alleles However, the Papua New Guineans were significantly with 5 to 10 repeats was about 80% in Europeans, Syr- different from each of these populations and the Asian ians, and Bantu speakers, approximately 90% in African population was significantly different from the Europe- Americans, and over 95% in Asians, Papua New Guin- ans, Syrians, and Bantu speakers (p < 0.001). The eans, and Khoisan speakers. In fact, no alleles in the Khoisan speakers were significantly different only from Papua New Guinea population had more than 10 GAA the Syrians (p < 0.001), although the comparison be- repeats, and only one allele in the Asian population had tween the Khoisan speakers and the Bantu speakers ap- p < 0.004). The lack ,8.329 ס more than 10 repeats. proached significance (␹2 The heterozygosity values also varied widely, ranging of significance may be due to the small size of the from 0.61 in the Papua New Guineans to 0.83 in the Khoisan sample (31 individuals). Bantu speakers. The heterozygosity value for Europeans (0.71) was similar to that previously reported (0.72) in a German population (Epplen et al. 1997). Surprisingly, no GAA Repeats in Other Primates alleles in the pathological range (more than ∼80 repeats) were found in the population. With a FA frequency of We obtained and sequenced PCR products from chim- about 1/50,000 within the European population, we panzees, orangutans, gorilla, rhesus macaque, and man- would have expected about 6 individuals in the total gabey and were successful in amplifying New World sample of 661 individuals to be carriers of an expanded monkey DNA (tamarin) after designing new primers allele. However, if only the Europeans are included (222 from the rhesus macaque and mangabey sequences (Fig. individuals), then we would expect two expanded alleles. 2). No PCR products were obtained from the amplifica- Because the PCR based analysis is designed to detect tion of mouse, rat, hamster, guinea pig, sheep, cow, pig, expanded alleles, we believe that the lack of expanded dog, or cat DNA samples. alleles within the sample is the result of the limited sam- Analysis of the 10 chimpanzees showed that 4 were pling. The distributions of allele sizes in the Europeans, homozygous for four GAA repeats, 1 was homozygous Syrians, African Americans, and Bantu speakers were for three GAA repeats, and the other 5 chimpanzees had 235

Table 1. Comparison of GAA size distribution among population groups

Syrian African American Bantu speakers Khoisan speakers Asian Papua New Guinean

European ␹2 4.658 0.256 0.723 6.723 11.252 21.763 p value 0.033 0.613 0.395 0.010 0.001a 0.001a Syrian ␹2 5.076 0.986 12.304 20.290 34.284 p value 0.024 0.321 0.001a 0.001a 0.001a African American ␹2 1.307 5.249 7.972 16.027 p value 0.253 0.022 0.005b 0.001a Bantu speakers ␹2 8.329 13.093 23.571 p value 0.004b 0.001a 0.001a Khoisan speakers ␹2 0.102 0.066 p value 0.749 0.797 Asian ␹2 0.615 p value 0.433 a Significant. b Borderline significant. three GAA repeats on one allele and four on the other. amplified in the other primates and humans. The GAA The 13-bp direct repeat sequence flanking the Alu ele- repeats were not found in the expected location based on ment was identical to that found in humans. Analysis of results from the other primates. However, after sequenc- 336 bp between the direct repeats (excluding the GAA ing 814 bases, both of the 13-bp flanking direct repeat repeats) gave 11 nucleotides that differed from the hu- sequences were identified. The direct repeat sequence man sequence. (In humans, the corresponding number of from the macaque was identical to that in humans but base pairs is 340.) The Alu element poly(A) tract was flanked a DNA fragment that was almost 300 bp larger similar in size to that found in humans, varying from 15 than the DNA fragment found in humans and the other to 19 in the chimpanzees. The gorilla was homozygous nonhuman primates analyzed. Analysis using Repeat- for three GAA repeats, and, in addition to a 4-bp inser- Masker2 (www.ftp.genome.washington.edu) showed tion, 5 of the 344 bp between the direct repeats differed that about 95% of the additional sequence in the rhesus from those in humans. Gonzalez-Cabo et al. (1999) ob- macaque was composed of an AluY subfamily member. tained similar sequences for DNA samples from chim- Therefore, the AluY element integrated in the poly(A) panzees and gorillas at the Barcelona Zoo in Spain. In the tract that precedes the GAA repeat region. Gonzalez- three orangutan DNA samples that were analyzed, we Cabo et al. (1999) also found this insert in a rhesus found four or five GAA repeats. Again, the 13-bp direct macaque from the Barcelona Zoo in Spain. No direct repeat sequence flanking the Alu element was identical to repeat sequences, other than several adenines, were that in humans, and the poly(A) tract was similar to the found flanking the AluY element insertion. Following the human length. In the 332 bp between the direct repeats, AluY sequence, a GAA sequence motif was very similar 16 bp differed from the human sequence (Fig. 2). to the one found in the mangabey genome. In the re- The mangabey DNA sample was homozygous for maining 338 bp, 30 nucleotides differed from the human three GAA repeats followed by GAAAAAGAA (Fig. sequence (Fig. 2). 2b). The 13-bp flanking repeats were again identical to Amplification of the tamarin DNA generated a PCR those in humans, but in the 338 bp between these repeats, product of approximately 480 bp that was similar to the 28 differences in nucleotides were found. The poly(A) human sequence with two GAA repeats followed by tract leading into the GAA repeats had 14 adenines, GAAAAAGAA (Fig. 2). The 13-bp flanking direct re- which is within the range found in humans (14–18). Am- peat sequences flanking the Alu element were identical to plification of the rhesus macaque DNA gave a PCR those found in the humans. In the 332 bp between the product that was about 300 bp larger in size than that direct repeats, 27 nucleotide substitutions were different 236

Fig. 2. FA repeat region DNA sequences from nonhuman primates. The DNA sequences from the FA repeat regions in the of a number of nonhuman primates are shown. Asterisks represent the same base in the human sequence. are denoted by the appropriate nucleotide and missing bases are denoted by a dash. 237

Fig. 3. Phylogeny of the FRDA Alu changes. The AluSx insertion shown. The FRDA Alu is shown below. The Alu middle-A rich regions occurred before the divergence of the various monkeys. The expanded (including the GAA repeat) are shown for each of the primates, to- middle A-rich region and GAA repeat probably formed subsequent to gether with an approximate tree for primate divergence. The AluY the insertion of the Alu, and prior to the divergence of the monkeys, but inserted sometime after the divergence of the macaque from the man- may have formed at the time of Alu integration into the genome. The gabey. The various A-length and GAA repeat-length changes occurred consensus Alu is represented above, with its middle A-rich sequence along their specific lineages. from the human sequence. The poly(A) tract was trun- middle A-rich region of the AluSx element only in the cated, with only 9 bases, compared to the 17 bases found rhesus macaque provides evidence that this event oc- in the human genome. curred during the Old World monkey lineage after the divergence between the rhesus macaque and the man- gabey. This is not surprising because the AluY subfamily Discussion has retroposed within primate genomes over the last 25– 35 million years (Deininger and Batzer 1993). The recent In our study the number of GAA repeats for the majority integration of this Alu element may be a useful tool for of normal-sized alleles in the human populations varied the study of phylogenetic relations among Old World from 5 to 25, with the majority being less than 10. Ad- monkeys (Fig. 3). ditional alleles of sizes 34, 42, 45, and 50 were also The Friedreich ataxia GAA trinucleotide repeat is identified. These large normal alleles are likely to be within the middle A-rich region of an Alu element and unstable premutation alleles. Four were in Europeans, the repeat is found only in primate genomes, as would be two in African Americans, and one in an Asian indi- expected because Alu elements are primate specific vidual, but they may all be of European origin, with the (Deininger et al. 1981). The generation of a trinucleotide three in non-Europeans being the result of admixture simple sequence repeat from an Alu element A-rich re- (Parra et al. 1998). The absence of alleles in the premu- gion is also interesting since these regions of Alu ele- tation range in populations for which European admix- ments have previously been shown to serve as nuclei for ture is rare is consistent with the suggestion that Fried- the genesis of other simple sequence repeats (Arcot et al. reich ataxia is a European/Middle Eastern disease. In 1995). The primate order arose 65 million years ago, and fact, haplotype analyses by Labuda et al. (2000) suggest New World monkeys diverged from the prosimian pri- that all alleles with more than 10 repeats may have a mates about 55 million years ago. Because the GAA single African origin and that none of these alleles mi- trinucleotide is found within the genomes of New and grated to Southeast Asia. Our lack of detection of alleles Old World monkeys but not in nonhuman primate ge- with more than 10 alleles in either Papua New Guineans nomes, the trinucleotide repeat originated at least 40 mil- or Asians supports this conclusion. The difference be- lion years ago but not more than 65 million years ago. It tween the Khoisan and the Bantu speakers from Africa is seems likely that the middle A-rich region underwent an consistent with the fact that extensive genetic diversity early expansion, followed by the genesis of the first exists between the population groups in Africa (Stone- GAA repeats. This GAA repeat region does show size king et al. 1997). variation during primate evolution, but it remained rela- The identification of the AluY element within the tively short throughout millions of years until the human 238 lineage. Several nucleotide differences in the Alu se- Du¨rr A, Cosse´e M, Agid Y, et al. (1996) Clinical and genetic abnor- quence flanking the GAA repeats were also observed, malities in patients with Friedreich’s ataxia. N Engl J Med 335: with chimpanzee, gorilla, and orangutan being more 1169–1175 Epplen C, Epplen JT, Frank G, Miterski B, Santos EJM, Schöls L similar to the human sequence than mangabey, macaque, (1997) Differential stability of the (GAA)n tract in the Friedreich and tamarin, consistent with a neutral rate of evolution ataxia (STM7) gene. Hum Genet 99:834–836 for the Alu element. Filla A, De Michele G, Cavalcanti F, et al. (1996) The relationship between trinucleotide (GAA) repeat length and clinical features in Acknowledgments. This work was supported by grants from the Friedreich ataxia. Am J Hum Genet 59:554–560 Muscular Dystrophy Association (to B.J.B.K. and M.A.B.) and the Forrest SM, Delatycki M, Paris D, et al. (1997) The Friedreich’s ataxia National Ataxia Foundation (to B.J.B.K.), HEF (2000-05)-05 from the may originate from a premutation and shows size reduc- Louisiana Board of Regents Health Excellence Fund (to B.J.B.K., tion when transmitted from parent to offspring. Am J Hum Genet M.A.B., and P.L.D.), and National Institutes of Health Grant R01 61 (Suppl):A308 GM45668 (to P.L.D.). Gonzalez-Cabo P, Sanchez MI, Canizarez J, et al. (1999) Incipient GAA repeats in the primate Friedreich ataxia homologous genes. Mol Biol Evol 16:880–883 References Koutnikova H, Campuzano V, Foury F, Dollé P, Cazzalini O, Koenig M (1997) Studies of human, mouse and yeast homologues indicate a mitochondrial function for frataxin. Nature Genet 16:345–351 Arcot SS, Wang Z, Weber JL, Deininger PL, Batzer MA (1995) Alu Labuda M, Montermini L, Poirier J, et al. (1997) Molecular analysis of repeats: A source for the genesis of primate . Ge- patients with Friedreich ataxia phenotype not homozygous for the nomics 29:136–144 GAA triplet repeat expansion. Am J Hum Genet 61 (Suppl):A3l2 Babcock M, de Silva D, Oaks R, et al. (1997) Regulation of mitrochondrial iron accumulation by Yfhi1, a putative homolog of Labuda M, Labuda D, Miranda C, et al. (2000) Unique origin and frataxin. Science 276:1709–1712 specific ethnic distribution of the Friedreich ataxia GAA expansion. Bidichandani SI, Ashizawa T, Patel PI (1997) Atypical Friedreich Neurology 54:2332–2337 ataxia caused by compound heterozygosity for a novel missense Monrós E, Molto´ MD, Martı´nez F, et al. (1997) Phenotype correlation mutation and the GAA triplet-repeat expansion. Am J Hum Genet and intergenerational dynamics of the Friedreich ataxia GAA tri- 60:1251–1256 nucleotide repeat. Am J Hum Genet 61:101–110 Botti S, Castellotti B, Riggio MC, et al. (1997) X25 gene micromuta- Montermini L, Andermann E, Labuda M, et al. (1997a) The Friedreich tions in Friedreich’s ataxia alleles not carrying the GAA expan- ataxia GAA triplet repeat: Premutation and normal alleles. Hum sions. Am J Hum Genet 61 (Suppl):A327 Mol Genet 6:1261–1266 Campuzano V, Montermini L, Molto` MD, et al. (1996) Friedreich’s Montermini L, Richter A, Morgan K, et al. (1997b) Phenotypic vari- ataxia: Autosomal recessive disease caused by an intronic GAA ability in Friedreich ataxia: role of the associated GAA triplet re- triplet repeat expansion. Science 271:1423–1427 peat expansion. Ann Neurol 41:675–682 Campuzano V, Montermini L, Lutz Y, et al. (1997) Frataxin is reduced Ott J (1991) Analysis of human genetic linkage. John Hopkins Uni- in Friedreich ataxia patients and is associated with mitochondrial versity Press, Baltimore membranes. Hum Mol Genet 6:1771–1780 Cosse´e M, Shmitt M, Campuzano V, et al. (1997) Evolution of the Parra E, Marcini A, Akey J, et al. (1998) Estimating African-American Friedreich’s ataxia trinucleotide repeat expansion: Founder effect admixture proportions by use of population-specific alleles. Am J and premutations. Proc Natl Acad Sci USA 94:7452–7457 Hum Genet 63:1839–1851 Deininger PL, Batzer MA (1993) Evolution of . In: Hecht Priller J, Scherzer CR, Faber PW, MacDonald ME, Young AB (1997) M, MacIntyre RJ, Clegg M (eds). Evolutionary biology, Vol 27. Frataxin gene of Friedreich’s ataxia is targeted to mitochondria. Plenum, New York, pp 157–196 Ann Neurol 42:265–269 Deininger PL, Jolly DJ, Rubin CM, Friedmann T, Schmid CW (1981) Stoneking M, Fontius JJ, Clifford SL, et al. (1997) Alu insertion poly- Base sequence studies of 300 nucleotide renatured repeated human morphisms and : Evidence for a larger population DNA clones. J Mol Biol 151:17 size in Africa. Genome Res 7:1061–1071