Estimate of the Mutation Rate Per Nucleotide in Humans
Total Page:16
File Type:pdf, Size:1020Kb
Copyright 2000 by the Genetics Society of America Estimate of the Mutation Rate per Nucleotide in Humans Michael W. Nachman and Susan L. Crowell Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona 85721 Manuscript received July 24, 1999 Accepted for publication May 19, 2000 ABSTRACT Many previous estimates of the mutation rate in humans have relied on screens of visible mutants. We investigated the rate and pattern of mutations at the nucleotide level by comparing pseudogenes in humans and chimpanzees to (i) provide an estimate of the average mutation rate per nucleotide, (ii) assess heterogeneity of mutation rate at different sites and for different types of mutations, (iii) test the hypothesis that the X chromosome has a lower mutation rate than autosomes, and (iv) estimate the deleterious mutation rate. Eighteen processed pseudogenes were sequenced, including 12 on autosomes and 6 on ϫ 10Ϫ8 mutations per nucleotide 2.5ف the X chromosome. The average mutation rate was estimated to be site or 175 mutations per diploid genome per generation. Rates of mutation for both transitions and transversions at CpG dinucleotides are one order of magnitude higher than mutation rates at other sites. Single nucleotide substitutions are 10 times more frequent than length mutations. Comparison of rates of evolution for X-linked and autosomal pseudogenes suggests that the male mutation rate is 4 times the female mutation rate, but provides no evidence for a reduction in mutation rate that is speci®c to the X chromosome. Using conservative calculations of the proportion of the genome subject to purifying selec- tion, we estimate that the genomic deleterious mutation rate (U) is at least 3. This high rate is dif®cult to reconcile with multiplicative ®tness effects of individual mutations and suggests that synergistic epistasis among harmful mutations may be common. UTATION is the ultimate source of genetic varia- 1997). Methods that rely on screens of disease pheno- M tion; it is both the substrate for evolution and types may provide an underestimate of the actual muta- the cause of genetic disease. Most previous estimates of tion rate because not all nucleotide changes in a disease the human mutation rate have utilized one of three gene will result in disease. On the other hand, some approaches. Two of these approaches rely on pheno- disease genes are likely to have been identi®ed and typic differences associated with diseases and the third studied in part because they have high mutation rates. approach relies on direct comparison of DNA sequences The third approach to measuring the human mutation without function. These phenotypic and molecular rate takes advantage of the well-known result that for methods are fundamentally different and rely on differ- neutral mutations, the mutation rate is equal to the rate ent assumptions. The ®rst approach, pioneered by of evolution (Kimura 1968). A direct comparison of Haldane (1932, 1935), assumes diseases are in muta- stretches of DNA without function can provide an esti- tion-selection balance. For recessive mutations, the equi- mate of the mutation rate per generation between spe- librium frequency of mutant alleles is √/s, where is cies whose divergence time and generation length are the mutation rate and s is the selective effect of the known. Comparisons of pseudogenes and of synony- deleterious mutation (Haldane 1927). The mutation mous sites between humans and chimpanzees have sug- Ϫ rate can be estimated if the frequency of mutant alleles gested mutation rates on the order of 10 8 per site per and the strength of selection are known. Using this generation (e.g., Kondrashov and Crow 1993; Drake -nonsyn 103ف method, Haldane (1932) calculated that the per locus et al. 1998). Since many genes may contain 10Ϫ5 onymous sites, this estimate is in reasonable agreementف rate of mutation for hemophilia in humans is Ϫ per generation. The second approach involves counts with per locus rates of 10 5 (Vogel and Motulsky of affected individuals born to unaffected parents for 1997). dominant disorders (Cooper and Krawczak 1993). Here we are interested in extending this work to ob- This method has been used for many dominant diseases tain a more precise estimate of the rate and pattern of in humans, and rates per locus vary from 10Ϫ6±10Ϫ4 mutation at the nucleotide level in humans. We have per generation (summarized in Vogel and Motulsky sequenced 18 processed pseudogenes in humans and chimpanzees, including 12 on autosomes and 6 on the X chromosome. First, we provide an estimate of the average underlying mutation rate per nucleotide site. Corresponding author: Michael W. Nachman, Department of Ecology and Evolutionary Biology, Biosciences West Bldg., University of Ari- Second, we compare mutation rates for different sites zona, Tucson, AZ 85721. E-mail: [email protected] and for different classes of mutation to evaluate hetero- Genetics 156: 297±304 (September 2000) 298 M. W. Nachman and S. L. Crowell geneity of mutation rate. Third, we compare rates of t divergence on the X chromosome and on autosomes K to evaluate the hypothesis that the X chromosome has a lower mutation rate than the autosomes (McVean and Hurst 1997). Finally, we provide an approximation i of the genomic deleterious mutation rate by considering K the total mutation rate and the fraction of the genome that is subject to constraint (Kondrashov and Crow 1993). n K MATERIALS AND METHODS Samples: For each locus, two humans and one common chimpanzee were surveyed. Human genomic DNAs were pro- tv K ). Mean values are weighted by the number vided by Dr. M. F. Hammer from the Y chromosome consor- t K tium DNA repository and represent one African male and one ) at CpG sites and at non-CpG sites, all single tv Caucasian male. Male chimpanzee (Pan troglodytes) DNA was K provided by Dr. O. A. Ryder. ts PCR ampli®cation and DNA sequencing: Eighteen pro- K cessed pseudogenes were PCR ampli®ed (Saiki et al. 1988) directly from genomic DNA. The names and chromosomal locations for each pseudogene are given in Tables 1 and 2. PCR conditions were optimized for each locus. Typically, PCR Њ was performed in 25- l volumes with 40 cycles of 94 1 min, tv 55Њ 1 min, and 72Њ 2 min. PCR and sequencing primers were K ) and transversions ( designed from published human sequences with the following ts K accession numbers: gamma-cytoplasmic actin (Actgp3), D50- 657; ␣-enolase, X15277; connexin 43, M65189; cytochrome b, AC002087; C4-sterol methyl oxidase (Desp4), U93261; elonga- tion factor 1-␣ (Ef1 alpha), AC002086; Ferritin, U46066; ␣-1, ts 3-galactosyltransferase (Hgt-2), M60263; interferon-induced K 56-kD protein (II56), Z74739; lanosterol 14-␣demethylase (Cyp51), U40053; malate dehydrogenase, Z93019; NADH de- TABLE 1 hydrogenase, Z81369; proliferation-associated gene (Pag), X72297; GPI-anchor synthesis gene (PIGF), D49727; regula- tory subunit RI ␣ of cAMP-dependent protein kinase (RI alpha), X73110; adaptor protein (Shc), Y09846; Translin, AC002075; HTLV-1 enhancer-binding protein (Txreb), U03712. For each locus, at least one ampli®cation primer was designed 1980) is given for transitions ( to lie outside of the pseudogene. By utilizing processed pseu- dogenes, we were able to generate a sequence across the site of integration for each locus in both chimpanzees and hu- Kimura mans. This allowed us to con®rm that we were comparing Rates of evolution for autosomal processed pseudogenes orthologous pseudogenes that had integrated prior to the 902 0.457 0.0063 0.0029 0.0875 0.0232 0.0121 0.0012 0.0133 human-chimpanzee divergence. Patterns of substitution fur- Length Non-CpG Non-CpG CpG CpG ), and total changes including both point mutations and insertion-deletions ( ther con®rmed that all loci were pseudogenes: nonsynony- i mous substitutions outnumbered synonymous substitutions K and many genes had frameshift mutations. Products were cy- cle-sequenced on both strands and run on an ABI 377 auto- mated sequencer. No differences were detected between 1 872 0.426 0.0082 0.0015 0.0883 0.0883 0.0118 0.0034 0.0154 strands, suggesting that the error rate due to DNA sequencing is likely to be low. The amount of sequence generated for each locus is given in Tables 1 and 2; the average sequence length for each autosomal locus was 902 bp and the average sequence length for each X-linked locus was 877 bp. A total Ϯ ), insertion-deletions ( of 16,089 bp was sequenced in each individual. Sequences n have been submitted to GenBank under accession nos. AF- K 196978±AF197019. Data analysis: Chromatograms were scored by hand and sequences were aligned manually. Heterozygous sites were con®rmed on both strands. Divergence between human and chimpanzee was calculated as the average pairwise difference between the two chimpanzee alleles and the four human al- ␣ Divergence per site between human and chimpanzee ( standard errors 39 0.011 0.0008 0.0005 0.0173 0.0086 0.0011 0.0003 0.0011 leles using Kimura's two-parameter model (Kimura 1980). -Enolase 1 1034 0.505 0.0051 0.0010 0.1116 0.0504 0.0119 0.0000 0.0119 Cytoplasmic actin␣ ConnexinCytochrome bHgt-2Interferon proteinLanosterolNADH dehydrogenasePag 20PigfR1 13 5 22 5 955 13 12 0.528 1151 713 718 768 1048 9 0.0111 873 0.464 0.453 0.421 5 0.474 0.0066 0.456 0.462 0.0056 0.0075 0.0029 0.0067 883 865 0.1701 0.0025 0.0018 0.0028 0.0000 0.0039 0.0040 0.0000 0.470 0.384 0.1134 0.1203 0.0698 0.0245 0.0030 0.1019 0.0024 0.0000 0.0093 0.0000 0.0000 0.0010 0.0620 0.0112 0.0517 0.0000 0.0107 0.0117 0.0042 0.0256 0.0148 0.0000 0.0070 0.0000 0.0041 0.0000 0.0000 0.0000 0.0013 0.0072 0.0060 0.1203 0.0107 0.0117 0.0042 0.1585 0.0161 0.0019 0.0000 0.0000 0.0751 0.0091 0.0060 0.0183 0.0187 0.0034 0.0023 0.0218 0.0211 TranslinMean autosomal values 7 944 0.443 0.0022 0.0008 0.0000 0.0613 0.0050 0.0011 0.0061 Locus Chromosome (bp) GC content nucleotide changes ( Standard errors were also calculated from this model.