US 2005O123913A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2005/0123913 A1 Wallace et al. (43) Pub. Date: Jun. 9, 2005

(54) HUMAN MITOCHONDRIAL DNA (57) ABSTRACT POLYMORPHISMS, , This invention provides human mtDNA polymorphisms that ASSOCATIONS WITH PHYSIOLOGICAL are diagnostic of all the major human haplogroups and CONDITIONS, AND GENOTYPING ARRAYS methods of diagnosing those haplogroupS and Selected Sub haplogroups. This invention also provides methods for iden tifying evolutionarily Significant mitochondrial DNA genes, (75) Inventors: Douglas C. Wallace, Irvine, CA (US); nucleotide alleles, and amino acid alleles. Evolutionarily Seyed Hosseini, Duluth, GA (US); Dan Significant genes and alleles are identified using one or two Mishmar, Irvine, CA (US); Eduardo populations of a Single species. The process of identifying Ruiz-Pesini, Irvine, CA (US); Marie evolutionarily Significant nucleotide alleles involves identi Lott, Atlanta, GA (US) fying evolutionarily significant genes and then evolution arily significant nucleotide alleles in those genes, and iden Correspondence Address: tifying evolutionarily Significant amino acid alleles involves Greenlee Winner & Sullivan identifying amino acids encoded by all nonsynonymous Suite 200 alleles. Synonymous codings of the nucleotide alleles 4875 Pearl East Circle encoding evolutionarily significant amino acid alleles of this Boulder, CO 80301 (US) invention are equivalent to the evolutionarily significant amino acid alleles disclosed herein and are included within (73) Assignee: Emory University, Atlanta, GA (US) the Scope of this invention. Synonymous codings include alleles at neighboring nucleotide loci that are within the (21) Appl. No.: 10/488,618 Same codon. This invention also provides methods for asSociating haplogroupS and evolutionarily significant (22) PCT Filed: Aug. 30, 2002 nucleotide and amino acid alleles with predispositions to physiological conditions. Methods for diagnosing predispo (86) PCT No.: PCT/US02/28471 Sition to LHON, and methods for diagnosing increased likelihood of developing blindness, centenaria, and Related U.S. Application Data increased longevity that are not dependent on the geographi cal location of the individual being diagnosed are provided (60) Provisional application No. 60/316,333, filed on Aug. herein. Diagnosis of an individual with a predisposition to an 30, 2001. Provisional application No. 60/380,546, energy metabolism-related physiological condition is depen filed on May 13, 2002. dent on the geographic region of the individual. Physiologi (30) Foreign Application Priority Data cal conditions diagnosable by the methods of this invention include healthy conditions and pathological conditions. Aug. 31, 2001 (CA)...... 2356536 Physiological conditions that are associated with haplo groupS and with alleles provided by this invention include Publication Classification energetic imbalance, metabolic disease, abnormal energy metabolism, abnormal temperature regulation, abnormal (51) Int. Cl." ...... C12O 1/68; G06F 19/00; oxidative phosphorylation, abnormal electron transport, G01N 33/48; G01N 33/50 obesity, amount of body fat, diabetes, hypertension, and (52) U.S. Cl...... 435/6; 702/20 cardiovascular disease. Patent Application Publication Jun. 9, 2005 Sheet 1 of 5 US 2005/0123913 A1

-000'09'], 000'01), Patent Application Publication Jun. 9, 2005 Sheet 2 of 5 US 2005/0123913 A1

VZ(OIH Patent Application Publication Jun. 9, 2005 Sheet 3 of 5 US 2005/0123913 A1

004

Z?

004 004 Patent Application Publication Jun. 9, 2005 Sheet 4 of 5 US 2005/0123913 A1

iššião an on -> x a .55 ]ÕI ]ITI |ZT Patent Application Publication Jun. 9, 2005 Sheet 5 of 5 US 2005/0123.913 A1

aretatasaresearray Rese seases

cease ersceases-

Grossaea ease eye. Secessessesses ------5 - s 2. 83 2. S

asses

See assassissances aessels seasesses

reassists Revastates

LO v (Sy/ey)u=OX US 2005/O123913 A1 Jun. 9, 2005

HUMAN MITOCHONDRIAL DNA 22976; Wallace, D. C. (1999) “Mitochondrial Diseases in POLYMORPHISMS, HAPLOGROUPS, Man and Mouse'Science 283:1482–1488; Saraste, M. ASSOCATIONS WITH PHYSIOLOGICAL (1999) “Oxidative Phosphorylation at the fin de siecle CONDITIONS, AND GENOTYPING ARRAYS 'Science 283:1488-1493; Kokoszka et. al. (2001) “Increased mitochondrial oxidative stress in the Sod2 (+/-) mouse CROSS-REFERENCE TO RELATED results in the age-related decline of mitochondrial function APPLICATIONS culminating in increased apoptosis'PNAS 98:2278-2283; Wallace, D.C. (2001) Mental Retardation and Developmen 0001. This application claims priority to U.S. Patent tal Disabilities 7:158-166; Wallace, D. C. (2001) Am. J. Application Ser. No. 60/316,333 filed Aug. 30, 2001 and Ser. Med. Gen. 106:71-93; Wei, Y-H et al. (2001) Chinese No. 60/380,546 filed May 13, 2002, and to Canadian Patent Medical Journal (Taipei) 64:259-270; and Wallace, D. C. Application No. 2,356,536 filed on Aug. 31, 2001, which are hereby incorporated in their entirety by reference to the (2001) EuroMit 5 Abstract. extent not inconsistent with the disclosure herein. 0005 Certain mitochondrial mutations have been asso ciated with physiological conditions (U.S. Pat. No. 6,280, STATEMENT REGARDING FEDERALLY 966 issued on Aug. 28, 2001; U.S. Pat. No. 6,140,067 issued SPONSORED RESEARCH on Oct. 31, 2000; U.S. Pat. No. 5,670,320; U.S. Pat. No. 5,296,349; U.S. Pat. No. 5,185,244; U.S. Pat. No. 5,494,794; 0002 This invention was made in part with funding from Wallace, D. C. (1999) Science 283: 1482–1488; Brown, M. the United States Government (NIH grants AG 13154, D. et al. (2001) American Society for Human Genetics HL4017, NS21328, and NS37167). The United States Gov Poster #2332; Brown, M. D. et al., (2001) Human Genet. ernment may have certain rights therein. 109:33-39; and Brown, M. D. et al. (January 2002) Human Genet. 110:130-138), Wallace, D. C. et al. (1999) Gene BACKGROUND OF THE INVENTION 238:211-230 describes analysis of LHON mutants. Gross man, L. I. et al. (2001) Molecular Phylogenetics and Evo 0003) Human mitochondrial DNA (mtDNA) is mater lution 18(1):26-36, describes changes in the biochemical nally inherited. Mutations accumulate Sequentially in radi machinery for aerobic energy metabolism. Kalman, B. et al. ating lineages creating branches on the human evolutionary (1999) Acta Neurol. Scand. 99(1): 16-25 describes mito tree. Using Sequences of mtDNA, human populations are chondrial mutations and multiple sclerosis (MS). Wei, Y. H. divisible evolutionarily into haplogroups (Wallace, D. C. et et al. (2001) Chinese Medical Journal 64:259-270 describes al. (1999) Gene 238:211-230; Ingman M. et al., (2000) recent results in Support of the mitochondrial theory of Nature 408:708–713; Maca-Meyer, N. (August 2001) aging. BioMed Central 2:13; T. G. Schurr et al., (1999) American Journal of Physical Anthropology 108:1-39; and V. 0006 Ivanova, R. et al. (1998) Geronotology 44:349 Macaulay et al., (1999) American Journal of Human Genet describes mitochondrial haplotypes and longevity in a ics 64:232-249). Related haplogroups can be combined into French population. Tanaka, M. et al. (1998) Lancet 351:185 macro-haplogroups. Haplogroups can be Subdivided into 186 describes longevity and haplogroups in a Japanese Subhaplogroups. The complete Cambridge mitochondrial population. De Benedictis, G. et al. (1999) FASEB 13:1532 DNA sequence may be found at MITOMAP, http://www 1536 describes haplogroups and longevity in an Italian gen.emory.edu/cgi-gin/MITOMAP, Genbank accession no. population. Rose, G. et al. (2001) European Journal of J01415, and is provided in SEQ ID NO:2. Also see Andrews Human Genetics 9:701-707 describes J in cen et al. (1999), “Reanalysis and Revision of the Cambridge tenarians. Ross, O. A. et al. (2001) Experimental Gerontol Reference Sequence for Human Mitochondrial DNA, Na ogy 36(7): 1161-1178 describes haplotypes and longevity in ture Genetics 23:147. an Irish population. 0004 Publications on the subject of mitochondrial biol 0007 Haplogroup T has been associated with reduced ogy include: Scheffler, I. E. (1999) Mitochondria, Wiley Sperm motility in European males (E. Ruiz-Pesini et al., Liss, NY; Lestienne P Ed., Mitochondrial Diseases. Models 2000 American Journal of Human Genetics 67:682-696), and Methods, Springer-Verlag, Berlin; Methods in Enzymol the tRNA"np 4336 variant in haplogroup H is associated ogy (2000) 322: Section V Mitochondria and Apoptosis, with late-onset Alzheimer Disease (J. M. Shoffner et al., Academic Press, CA; Mitochondria and Cell Death (1999) 1993 Genomics 17:171-184). Princeton University Press, NJ; Papa S, Ferrucio G, and Tager J Eds.; Frontiers of Cellular Bioenergetics. Molecular 0008 Taylor, R. W. (1997) J. of Bioenergetics and Biology, Biochemistry, and Physiopathology, Kluwer Aca Biomembranes 29(2):195-205 describes methods for treat demic/Plenum Publishers, NY; Lemasters, J. and Nieminen, ing mitochondrial disease. Colombet, J. and Coutelle, C. A. (2001) Mitochondria in Pathogenesis, Kluwer Academic/ (1998) Molecular Medicine Today 4(1):1-8 describes gene Plenum Publishers, NY; MITOMAP http://www.gen.emo therapy for mitochondrial disorders, including using cell ry.edu/cgi-gin/MITOMAP; Wallace, D.C. (2001) “A mito fusion to introduce healthy mitochondria. Owen, R. and chondrial paradigm for degenerative diseases and ageing Flotte, T. R. (2001) Antioxidants and Redox Signaling Novartis Foundation Symposium 235:247-266; Wallace, D. 3(3):451-460 discuss approaches and limitations to gene C. (1997) “Mitochondrial DNA in Aging and Disease'Sci therapy for mitochondrial diseases. entific American August 277:40-47; Wallace, D. C. et al., 0009 Human mitochondrial DNA sequence variation, (1998) “Mitochondrial biology, degenerative diseases and except that which has been associated with particular dis aging,” BioFactors 7:187-190; Heddi, A. et al., (1999) eases, has not been associated with Specific phenotypic “Coordinate Induction of Energy Gene Expression in Tis conditions, has been considered neutral, and has been used Sues of Mitochondrial Disease Patients'JBC 274:22968 to reconstruct human phylogenies (Henry Gee, “Statistical US 2005/O123913 A1 Jun. 9, 2005

Cloud over African Eden,” (13 Feb. 1992) Nature 355:583; analysis of the human mitochondrial NADHDehydrogenase Marcia Barinaga, “African Eve Backers Beat a Retreat,” (7 subunit 3 (NADH3) gene, when compared to the NADH Feb. 1992) Science, 255:687; S. Blair Hedges et al., “Human Dehydrogenase Subunit 3 gene from chimpanzees. Nach Origins and Analysis of Mitochondrial DNA Sequences.” (7 man, M. W. et al. (1994) Proc. Nat. Acad. Sci. USA 76:5269 Feb. 1992) Science, 255:737-739; Allan C. Wilson and 5273 describes neutrality analysis of the mitochondrial Rebecca L. Cann, “The Recent African Genesis of Humans.” NADH dehydrogenase subunit 3 gene in 3 strains of mouse. (April 1992) Scientific American, 68). The average number Rand, D. M. et al. (1994) Genetics 138:741-756; Ballard, J. of base pair differences between two human mitochondrial W. O. and Kreitman, M. (1994) Genetics 138:757-772; and genomes is estimated to be from 9.5 to 66 (Zeviani M. et al. Kaneko, M.Y. et al. (1993) Genet. Res.61:195-204, describe (1998) “Reviews in molecular medicine: Mitochondrial dis neutrality analysis for mitochondrial NADH dehydrogenase orders,”Medicine 77:59-72). subunit 5, Cytochrome b, and ATPase6 in strains of Droso phila. 0.010 The D-loop is the most variable region in the mitochondrial genome, and the most polymorphic nucle 0014. In the above-mentioned publications, neutrality otide Sites within this loop are concentrated in two hyper testing, including K/K analysis, has not been applied for variable segments, HVS-I and HVS-II (Wilkinson-Herbots, the purpose of identifying disease-associated mutations. H. M. et al., (1996) “Site 73 in hypervariable region II of the Populations for neutrality testing analysis were identified by human mitochondrial genome and the origin of European observation of normal phenotypic variation. Neutrality test populations, Ann Hum Genet 60:499-508). Population-spe ing has been performed to determine whether a gene is under cific, neutral mtDNA variants have been identified by Sur selection. None of these publications describe neutrality veying mtDNA restriction Site variants or by Sequencing analysis with the purpose of identifying phenotype-associ hyperVariable Segments in the displacement loop. Restric ated mutations, and no Suspected phenotype-associated tion analysis using fourteen restriction endonucleases mutations were identified. allowed screening of 15-20% of the mtDNA sequence for variations (Chen Y. S. et al., (1995) “Analysis of mtDNA 0.015 U.S. Pat. No. 6.228,586 (issued May 8, 2001) and variation in African populations reveals the most ancient of U.S. Pat. No. 6,280.953 (issued Aug. 28, 2001) describe all human continent-specific haplogroups, Am J Hum Genet methods for identifying polynucleotide and polypeptide Sequences in human and/or non-human primates, which may 57:133-149). The large majority of mtDNA sequence data be associated with a physiological condition. The methods published to date are limited to HVS-I. Bandelt, H. J. et al., employ comparison of human and non-human primate (1995) “Mitochondrial portraits of human populations using sequences using statistical methods. U.S. Pat. No. 6,274,319 median networks'Genetics 141:743-753). (issued Aug. 14, 2001) describes K/K methods for identi 0.011 The coding and classification system that has been fying polynucleotide and polypeptide Sequences that may be used for mtDNA haplogroups refers primarily to the infor asSociated with commercially or aesthetically relevant traits mation provided by RFLPs and the hypervariable segments in domesticated plants or animals. The methods employ of the control region. (Torroni, A. et al. (1996) “Classifica comparison of homologous genes from the domesticated tion of European mtDNAS from an analysis of three Euro organism and its wild ancestor to identify evolutionarily pean populations,'Genetics 144:1835-1850 and Richards M Significant changes. In the above-mentioned publications, B et al., (1998) “Phylogeography of mitochondrial DNA in neutrality testing, including K/K analysis, is only applied western Europe, Ann Hum Genet 62:241-260.) to interSpecific, not intraspecific, comparisons, and only genes from the nuclear genome, not from organelle 0012 Methods are known for testing the likelihood of genomes, are analyzed. neutrality of mutations (Tajima, F. (1989) Genetics 123:585 595; Fu, Y. and Li, W. (1993) Genetics 133:693-709; Li, W. 0016 Methods for constructing peptide and nucleotide et al. (1985) Mol. Biol. Evol. 2(2): 150-174; and Nei, M. and libraries are well known to the art, e.g. as described in U.S. Gojobori, T. (198.6) Mol. Biol. Evol. 3(5):418-426). All of Pat. Nos. 6,156,511 and 6,130,092. Sequencing methods are the methods in these publications are used to compare also known to the art, e.g., as described in U.S. Pat. No. datasets taken from Separate groups. None of these methods 6,087,095. Arrays of nucleic acid have been used for are used to analyze a dataset not containing data represent Sequencing and for identifying exceptional alleles including ing an outgroup. disease-associated alleles. Nucleic acid arrays have been described, e.g., in patent nos.: U.S. Pat. Nos. 5,837,832, 0013 Wise, C. A. et al. (1998) Genetics 148:409-421, 5,807,522, 6,007.987, 6,110,426, WO 99/05324, 99/05591, describes neutrality analysis of the human mitochondrial WO 00/58516, WO 95/11995, WO 95/35505A1, WO NADH Dehydrogenase Subunit 2 gene, when compared to 99/42813, JP10503841T2, GR3030430T3, ES2134481T3, the NADH Dehydrogenase Subunit 2 gene from chimpan EP804731B1, DE69509925C0, CA2192095AA, zees. Templeton, A. R. (1996) Genetics 144:1263-1270, AU2862995A1, AU709276 B2, AT180570, EP1066506, and describes neutrality analysis of the human mitochondrial AU 2780499. Computational methods are useful for ana Cytochrome Oxidase II (COXII) gene when compared to the lyzing hybridization results, e.g., as described in PCT Pub COXII gene in hominoid primates. Messier, W. and Stewart, lication WO99/05574, and U.S. Pat. Nos. 5,754,524; 6228, C. (1997) Nature 385:151-154 describes neutrality analysis 575; 5,593,839; and 5,856,101. Methods for screening for of primate lysozymes. Endo, T. et al. (1996) Mol. Biol. Evol. disease markers are also known to the art, e.g. as described 13(5):685-690 describes large-scale neutrality analysis of in U.S. Pat. Nos. 6,228,586; 6,160,104; 6,083,698; 6,268, sequences from DDBJ, EMBL, and GenBank databases. Hughes, A. L. and Nei, M. (1988) Nature 335:167-170 398; 6.228,578; and 6,265,174. describes neutrality analysis of MC Class I loci. Nachman, 0017. The development of microarray technologies has M. W. (1996) Genetics 142:953-963 describes neutrality Stemmed from the desire to examine very large numbers of US 2005/O123913 A1 Jun. 9, 2005 nucleic acid probe Sequences simultaneously, in an effort to sample DNA, hybridized, and the identity/abundance of obtain information about genetic mutations, gene expression complementary Sequences is determined. This method, “his or nucleic acid Sequences. Microarray technologies are torically” called DNA chips, was developed at Affymetrix, intimately connected with the Human Genome Project, Inc., which Sells its photolithographically fabricated prod which has development of rapid methods of nucleic acid ucts under the GeneChip(E) trademark. Many companies are Sequencing and genome analysis as key objectives (E. manufacturing oligonucleotide-based chips using alternative Marshall, (1995) Science 268: 1270), as well as elucidation of sequence-function relationships (M. Schena et al., (1996) in-Situ Synthesis or depositioning technologies. Proc. Natl. Acad. Sci. USA, 93:10614). Microarray hybrid 0021 Probes on arrays can be hybridized with fluores ization of PCR-amplified fragments to allele-Specific oligo cently-labeled target polynucleotides and the hybridized nucleotide (ASO) probes is widely used in large-scale single array can be Scanned by means of Scanning fluorescence nucleotide polymorphism (SNP) genotyping (Huber M. et microScopy. The fluorescence patterns are then analyzed by al. (2002) Analytical Biochemistry 303:25-33 and Southern, an algorithm that determines the extent of mismatch content E. M. (1996) Trends Genet. 12:110-115). identifies polymorphisms, and provides Some general 0018. The Affymetrix GeneChip(R) HuSNPTM Array sequencing information (M. Chee et al., 1996 Science enables whole-genome Surveys by Simultaneously tracking 274:610). Selectivity is afforded in this system by low nearly 1,500 genetic variations, known as Single nucleotide Stringency washes to rinse away non-Selectively adsorbed polymorphisms (SNPs), dispersed throughout the genome. materials. Subsequent analysis of relative binding Signals The HuSNP Affymetrix Array is being used for familial from array elements determines where base-pair mismatches linkage Studies that aim to map inherited disease or drug may exist. This method then relies on conventional chemical Susceptibilities as well as for tracking de novo genetic methods to maximize Stringency, and automated pattern alterations. For genotyping, arrayS rely on multiple probes to recognition processing is used to discriminate between fully interrogate individual nucleotides in a Sequence. The iden complementary and partially complementary binding. tity of a target base can be deduced using four identical probes that vary only in the target position, each containing 0022 Devices such as standard nucleic acid microarrays one of the four possible bases. Alternatively, the presence of or gene chips, require data processing algorithms and the use a consensus Sequence can be tested using one or two probes of Sample redundancy (i.e., many of the same types of array representing specific alleles. To genotype heterozygous or elements for statistically significant data interpretation and genetically mixed Samples, arrays with many probes can be avoidance of anomalies) to provide Semi-quantitative analy created to provide redundant information. sis of polymorphisms or levels of mismatch between the 0019 Arrays, also called DNA microarrays or DNA target Sequence and Sequences immobilized on the device chips, are fabricated by high-speed robotics, generally on Surface. glass but Sometimes on nylon Substrates, for which probes 0023 Labels appropriate for array analysis are known in (Phimister, B. (1999) Nature Genetics 21 s: 1-60) with the art. Examples are the two-color fluorescent Systems, known identity are used to determine complementary bind such as Cy3/Cy5 and Cy3.5/Cy5.5 phosphoramidites (Glen ing. An experiment with a single DNA chip can provide Research, Sterling Va.). Patents covering cyanine dyes researchers information on thousands of genes Simulta include: U.S. Pat. No. 6,114,350 (Sep. 5, 2000); U.S. Pat. neously. There are Several Steps in the design and imple No. 6,197.956 (Mar. 6, 2001); U.S. Pat. No. 6,204,389 (Mar. mentation of a DNA array experiment. Many Strategies have 20, 2001) and U.S. Pat. No. 6,224,644 (May 1, 2001). Array been investigated at each of these steps: 1) DNA types; 2) printers and readers are available in the art. Chip fabrication; 3) Sample preparation; 4) ASSay; 5) Read 0024. A process of using arrays is described in Grig out; and 6) Software (informatics). orenko, E. V. ed., (2002) DNA Arrays. Technologies and 0020. There are two major application forms for the array Experimental Strategies, CRC Press, NY; Vrana, K. E. et al., technology: 1) Determination of expression level (abun (May 2001) Microarrays and Related Technologies: Minia dance) of genes; and 2) Identification of Sequence (gene/ turization and Acceleration of Genomics Research, CHI, gene mutation). There appear to be two variants of the array Upper Falls, Mass.; and Branca, M. A. et al., (February technology, in terms of intellectual property, of arrayed 2002) DNA Microarray Informatics: Key Technological DNA sequence with known identity: Format I consists of Trends and Commercial Opportunities, CHI, Upper Falls, probe cDNA (500-5,000 bases long) immobilized to a solid Mass. Surface Such as glass using robot Spotting and exposed to a 0025 All publications referred to herein are incorporated Set of targets either Separately or in a mixture. This method, “traditionally” called DNA microarray, is widely considered by reference to the extent not inconsistent herewith. The as having been developed at Stanford University. (R. Ekins mention of a publication in this Background Section does and F. W. Chu “Microarrays: their origins and applications, not constitute an admission that it is prior art. "1999 Trends in Biotechnology, 17:217-218). Format II consists of an array of oligonucleotide (20-80-mer oligos) SUMMARY OF INVENTION or peptide nucleic acid (PNA) probes synthesized either in 0026. The high mitochondrial DNA mutation rate of situ (on-chip) or by conventional synthesis followed by human mitochondrial DNA has been thought to result in the on-chip immobilization. The array is exposed to labeled accumulation of a wide range of neutral, population-specific US 2005/O123913 A1 Jun. 9, 2005

base Substitutions in mtDNA. These have accumulated ance, metabolic disease, abnormal energy metabolism, Sequentially along radiating maternal lineages that have abnormal temperature regulation, abnormal oxidative phos diverged approximately on the same time Scale as human phorylation, abnormal electron transport, obesity, amount of populations have colonized different geographical regions of body fat, diabetes, hypertension, and cardiovascular disease. the world. 0031 Molecules having sequences provided by this 0027. About 76% of all African mtDNAs fall into hap invention are provided in libraries and on genotyping arrayS. logroup L, defined by an HpaI restriction Site gain at bp This invention provides methods of making and using the 3592.77% of Asian mtDNAS are encompassed within a genotyping arrays of this invention. The arrays of this Super-haplogroup defined by a Dde site gain at bp 10394 invention are useful for determining the presence and and an AluI site gain at bp 10397. Essentially all native absence of nucleotide alleles of this invention, for determin American mtDNAS fall into four haplogroups, A-D. Hap ing a haplogroup, and for diagnosis. logroup A is defined by a Hae|II site gain at bp 663, B by a 0032. This invention also provides machine-readable 9 bp deletion between bp 8271 to bp 8281, C by a HincII site Storage devices and program devices for Storing data and loss at bp 13259, and D defined by an Alu site loss at bp programmed methods for diagnosing haplogroups and 5176. Ten haplogroups encompass almost all mtDNAS in European populations. The ten-mtDNA haplogroups of physiological conditions. Europeans can be Surveyed by using a combination of data 0033. The arrays of this invention are useful for deter from RFLP analysis of the coding region and Sequencing of mining the presence and absence of nucleotide alleles of this the hyperVariable segment I. About 99% of European mtD invention, for determining a haplogroup, and for diagnosis. NAS fall into one often haplogroups: H, I, J, K, M, T, U, V, This invention also provides machine-readable Storage W or X. devices and program devices for Storing data and pro grammed methods for diagnosing haplogroups and physi 0028. This invention provides human mtDNA polymor phisms that are diagnostic of all the major human haplo ological conditions. groupS and methods of diagnosing those haplogroups and Selected Sub-haplogroups. BRIEF DESCRIPTION OF THE FIGURES 0029. This invention also provides methods for identify 0034 FIG. 1 shows a consensus neighbor-joining tree of ing evolutionarily significant mitochondrial DNA genes, 104 human mtDNA complete Sequences and two primate nucleotide alleles, and amino acid alleles. Evolutionarily Sequences. Numbers correspond to bootstrap values (% of Significant genes and alleles are identified using one or two 500 total bootstrap replicates) (Felsenstein, J. (1993) populations of a Single species. The process of identifying PHYLIP (Phylogeny Inference Package) 3.53c. Distributed evolutionarily Significant nucleotide alleles involves identi by author, Department of Genetics, University of Washing fying evolutionarily significant genes and then evolution ton, Seattle, Wash.). Maximum Likelihood (ML) and arily significant nucleotide alleles in those genes, and iden UPGMA yielded consistent branching orders with respect to tifying evolutionarily Significant amino acid alleles involves continent-specific mtDNA haplogroupS. Sequences: 11-53: identifying amino acids encoded by all nonsynonymous Genbank AF346963-AF347015 (4); E21U: Genbank alleles. Synonymous codings of the nucleotide alleles X93334, A1L1a: Genbank D38112, cam revise: Genbank encoding evolutionarily significant amino acid alleles of this NC 001807 corrected according to (R. M. Andrews et al., invention are equivalent to the evolutionarily significant Nature Genetics 23, 147 (1999)); the rest are 48 sequences amino acid alleles disclosed herein and are included within generated in this invention using an ABI 377. Specific the Scope of this invention. Synonymous codings include mutations in patient Samples that have been implicated in disease were excluded from this analysis, as well as gaps and alleles at neighboring nucleotide loci that are within the deletions, with the exception of the 9 bp deletion (nucleotide Same codon. position (np) 8272 to 8281). Haplogroups A, B, C, D, and X 0030 This invention also provides methods for associat were drawn from both Eurasia and the Americas. Haplo ing haplogroupS and evolutionarily significant nucleotide group names are designated with capital letters. P. paniscus and amino acid alleles with predispositions to physiological and P troglodytes mtDNA sequences were used as out conditions. Methods for diagnosing predisposition to groups. Haplogroups L0 and L1 encompass previously LHON, and methods for diagnosing increased likelihood of assigned L1a and L1b mtDNAS, respectively (Y. S. Chen et developing blindness, centenaria, and increased longevity al., American Journal of Human Genetics 66, 1362-1383 that are not dependent on the geographical location of the (2000)). individual being diagnosed are provided herein. Diagnosis of an individual with a predisposition to an energy metabo 0035 FIG. 2 shows the migrations of human haplo lism-related physiological condition is dependent on the groups around the World. +/-, +/-, or -/- equals Dde I geographic region of the individual. Physiological condi 10394 and Alu I 10397. * equals Rsa I. 16329. The mutation tions diagnosable by the methods of this invention include rate is 2.2-2.9% per million years. Time estimates are YBP healthy conditions and pathological conditions. Physiologi (years before present). cal conditions that are associated with haplogroupS and with 0036 FIG.3 shows a cladogram listing nucleotide alleles alleles provided by this invention include energetic imbal describing 21 major human haplogroups, 21 Sub-haplo US 2005/O123913 A1 Jun. 9, 2005 groups, and Several macro-haplogroups. The groups on the genes, data is shown for human, human compared to P. left are described by the alleles to their right. A vertical bar troglodytes, human compared to P. paniscus, and nine designates that each group to the left of the bar has all of the species of primates. For only ATP6 and ATP8, data is also alleles to the right of the bar. shown for fourteen Species of mammals. 0037 FIG. 4 shows the selective constraint (k values) of DETAILED DESCRIPTION OF THE mtDNA protein genes with comparisons among mammalian INVENTION species. Statistical significance (P<0.05) was determined 0038 Table 1 shows human mitochondrial nucleotide using ANOVA, t-tests or the Tukey-Kramer Multiple Com alleles, which have been associated with physiological con parisons tests. Most programs used are from DNAsp (J. ditions. In Table 1, columns three (nucleotide locus), five Rozas and R. Rozas, (1999) Bioinformatics 15:174-5). DNA (physiological condition nucleotide allele), and column two Sequence divergence was analyzed using the DIVERGE (physiological condition) make up the set of Human Mito program (Wisconsin Package Version 10.0, Genetics Com chondrial Nucleotide Alleles Known to be ASSociated with puter Group (GCG), Madison, Wis.). For all thirteen mtDNA Physiological Conditions.

TABLE 1. Human Mitochondrial Alleles Known to be Associated with Physiological Conditions Physiological Physiological Cambridge Condition Cambridge Condition Nucleotide Nucleotide Nucleotide Amino Acid Amino Acid Gene Physiological Condition Locus Allele Allele Allele Allele MTND1 *MELAS 3308 MTND1 *NIDDM; LHON; PEO 3316 MTND1 *LHON 3394 MTND1 *NIDDM 3394 MTND1 *ADPD 3397 MTND1 *LHON 3460 MTND1 *LHON 3496 MTND1 *LHON 3497 MTND1 *LHON 4136 MTND1 *LHON 416O MTND1 *LHON 4216 MTND2 *LHON 4917 MTND2 *LHON 5244 MTND2 AD 5460 MTND2 AD 5460 MTCO1 * Myoglobinuria, Exercise Intolerance 592O e MTCO1 *Multisystem Disorder 6930 MTCO1 *LHON 7444 MTCO2 * Mitochondrial Encephalomyopathy 7587 MTCO2 *MM 7671 MTCO2 * Multisystem Disorder 7896 Ter MTCO2 *Lactic Acidosis 8042 T nt del (AT) Ter MTATP6 *NARP 8993 MTATP6 *NARP/Leigh Disease 8993 MTATP6 *LHON 9101 MTATP6 *FBSN/Leigh Disease 91.76 MTATP6 *Leigh Disease 91.76 MTCO3 *LHON 9438 MTCO3 *Leigh-like 9537 frameshift MTCO3 *LHON 9738 MTCO3 *LHON 98.04 MTCO3 * Mitochondrial Encephalopathy 9952 ie MTCO3 *PEM; MELAS 9957 MTND3 *ESOC O191 MTND4 *MELAS 1084 MTND4 *LHON 1778 MTND4 *Exercise Intolerance 1832 MTND4 *DM 2026 MTNDS *MELAS 3513 MTNDS *MELAS 3514 MTNDS *LHON-like 3528 MTND5 *LHON 3708 MTND5 *LHON 3730 MTND6 *MELAS 4453 MTND6 *LDYT 4459 MTND6 *LHON 4484 MTND6 *LHON 4495 MTND6 *LHON 4568 MTCYB *PD/MELAS 4787 TTAA frameshift US 2005/O123913 A1 Jun. 9, 2005

TABLE 1-continued Human Mitochondrial Alleles Known to be Associated with Physiological Conditions Physiological Physiological Cambridge Condition Cambridge Condition Nucleotide Nucleotide Nucleotide Amino Acid Amino Acid Gene Physiological Condition Locus Allele Allele Allele Allele MTCYB *MM 15059 G A. G Ter MTCYB Exercise Intolerance 1515 O G A. W Ter MTCYB Exercise Intolerance 15197 T C S P MTCYB *Mitochondrial Encephalomyopathy 15242 G A. G Ter MTCYB *LHON 15257 G A. D N MTCYB Exercise Intolerance 15615 G. A. G D MTCYB *MM 15762 G. A. G E MTCYB *LHON 15812 G A. V M '(MITOMAP: A Human Mitochondrial Genome Database. Center for Molecular Medicine, Emory University, Atlanta, GA, USA. http://www.gen.emory.edu/mitomap.html, 2001). *Definitions: LHON Leber Hereditary Optic Neuropathy MM Mitochondrial Myopathy AD Alzheimer's Disease LIMM Lethal Infantile Mitochondrial Myopathy ADPD Alzheimer's Disease and Parkinson's Disease MMC Maternal Myopathy and Cardiomyopathy NARP Neurogenic muscle weakness, Ataxia, and Retinitis Pigmentosa; alternate phenotype at this locus is reported as Leigh Disease FICP Fatal Infantile Cardiomyopathy Plus a MELAS-associated Cardiomyopathy MELAS Mitochondrial Encephalomyopathy, Lactic Acidosis, and Stroke-like episodes LDYT Leber's hereditary optic neur opathy and DYsTonia MERRF Myoclonic Epilepsy and Ragged Red Muscle Fibers MHCM Maternally inherited Hypertrophic CardioMyopathy CPEO Chronic Progressive External Ophthalmoplegia KSS Kearns Sayre Syndrome DM Diabetes Mellitus DMDF Diabetes Mellitus + Deafness CIPO Chronic Intestinal Pseudoobstructton with myopathy and Ophthalmoplegia DEAF Maternally inherited DEAFness or aminoglycoside-induced DEAFness PEM Progressive encephalopathy SNHL SensoriNeural Hearing Loss

0.039 Thirteen protein-coding mitochondrial genes are instead of termination, AUA codes for methionine instead of known (Mito Map, http://www.gen.emory.edu/cgi-bin/MI isoleucine, and AGA and AGG are terminators instead of TOMAP). coding for arginine. 0041 AS used herein “printing” refers to the process of TABLE 2 creating an array of nucleic acids on known positions of a solid substrate. The arrays of this invention can be printed by Protein-coding Human MtDNA Genes Spotting, e.g., applying arrays of probes to a Solid Substrate, Gene Map Locus Abbreviation Location or to the Synthesis of probes in place on a Solid Substrate. AS used herein "glass Slide” refers to a Small piece of glass of NADH dehydrogenase 1 MTND1 ND1 3307-4262 the same dimensions as a Standard microScope Slide. AS used NADH dehydrogenase 2 MTND2 ND2 4470-5511 NADH dehydrogenase 3 MTND3 ND3 10059-104.04 herein, "prepared Substrate” refers to a Substrate that is NADH dehydrogenase 4L MTND4L ND4L 10470-10766 prepared with a Substance capable of Serving as an attach NADH dehydrogenase 4 MTND4 ND4 10760-12137 ment medium for attaching the probes to the Substrate, Such NADH dehydrogenase 5 MTND5 ND5 12337-14148 as poly Lysine. AS used herein, “sample” refers to a com NADH dehydrogenase 6 MTND6 ND6 14149-14673 position containing human mitochondrial DNA that can be Cytochrome b MTCYB Cytb 14747-15887 genotyped. AS used herein, "quantitative hybridization' Cytochrome c oxidase I MTCO1 COI 5904-7445 Cytochrome c oxidase II MTCO2 COII 7586-8269 refers to hybridization performed under appropriate condi Cytochrome c oxidase III MTCO3 COIII 92O7-9990 tions and using appropriate materials. Such that the Sequence ATP synthase 6 MTATP6 ATP6 8527-9207 of one nucleotide allele (a single nucleotide polymorphism) ATP synthase 8 MTATP8 ATP8 8366-8572 can be determined, Such as by hybridization of a molecule containing that allele to two or more probes, each containing As defined on Mito Map, http://www.gen.emory.edu/cgi-bin/MITOMAP, which is numbered relative to the Cambridge Sequence (Genbank acces different alleles at that nucleotide locus, all as is known in sion no. JO1415 and Andrews et al. (1999), A Reanalysis and Revision of the art. the Cambridge Reference Sequence for Human Mitochondrial DNA, 0042. As used herein, “physiological condition” includes Nature Genetics 23: 147. diseased conditions, healthy conditions, and cosmetic con ditions. Diseased conditions include, but are not limited to, 0040 Codon usage for mtDNA differs slightly from the metabolic diseaseS Such as diabetes, hypertension, and car universal code. For example, UGA codes for typtophan diovascular disease. Healthy conditions include, but are not US 2005/O123913 A1 Jun. 9, 2005 limited to, traits Such as increased longevity. Physiological tive phosphorylation' in Such an individual refers to oxida conditions include cosmetic conditions. Cosmetic condi tive phosphorylation that differs from that of the population tions include, but are not limited to, traits Such as amount of that is native to where he lives. As used herein, "abnormal body fat. Physiological conditions can change health Status electron transport' in Such an individual refers to electron in different contexts, Such as for the Same organism in a transport that differs from that of the population that is native different environment. Such different environments for to where he lives. As used herein “metabolic disease” of humans are different cultural environments or different Such an individual refers to metabolism that differs from that climatic contexts. Such as are found on different continents. of the population that is native to where he lives. AS used herein, "energetic imbalance' of Such an individual refers to 0.043 AS used herein, “neutrality analysis” refers to a balance of energy generation or use that differs from that analysis to determine the neutrality of one or more nucle of the population that is native to where he lives. AS used otide alleles and/or the gene containing the allele(s) using at herein, “obesity' of such an individual refers to a body least two alleles of a Sequence. Commonly, the alleles in a weight that, for the height of the individual, is 20% higher Sequence to be analyzed are divided into two groups, Syn than the average body weight that is recommended for the onymous and nonsynonymous. Codon usage tables showing population native to where the individual lives. AS used which codons encode which amino acids are used in this herein, “amount of body fat” of Such an individual refers to analysis. Codon usage tables for many organisms and a low or high percentage of body fat relative to what is genomes are available in the art. If a gene is determined to recommended for the population that is native to where he not be neutral, the gene is determined to have had Selection lives. preSSure applied to it during evolution, and to be evolution arily significant. The alleles that change amino acids in the 0049. As used herein, an isolated nucleic acid is a nucleic gene (nonsynonymous) are then determined to be non acid outside of the context in which it is found in nature. The neutral and evolutionarily significant. term covers, for example: (a) a DNA which has the Sequence of part of a naturally-occurring genomic DNA molecule but 0044 As used herein, “K/K” refers to a ratio of the is not flanked by both of the coding or noncoding Sequences proportion of nonsynonymous differences to the proportion that flank that part of the molecule in the genome of the of Synonymous differences in a DNA sequence analysis, as organism in which it naturally occurs; (b) a nucleic acid is known to the art. The proportion of nonsynonymous incorporated into a vector or into the genomic DNA of a differences is the number of nonsynonymous nucleotide prokaryote or eukaryote in a manner Such that the resulting Substitutions in a sequence per site at which a nonsynony molecule is not identical to any naturally-occurring vector or mous Substitution could occur. The proportion of Synony genomic DNA; (c) a separate molecule Such as a cDNA, a mous differences is the number of Synonymous nucleotide genomic fragment, a fragment produced by polymerase Substitutions in a Sequence per Site at which Synonymous chain reaction (PCR), or a restriction fragment; and (d) a Substitutions could occur. Alternatively, instead of only recombinant nucleotide Sequence that is part of a hybrid including the number of Sites in the denominator of each gene, i.e., a gene encoding a fusion protein, or a modified proportion, the number of alternative Substitutions that could gene having a sequence not found in nature. occur at each site are also included. Either definition may be 0050 AS used herein, “nucleotide locus” refers to a used as long as Similar definitions are used for both K and nucleotide position of the human mitochondrial genome. K in an analysis. K is K/K. The Cambridge sequence SEQ ID NO:2 is used as a refer 0.045. As used herein “nonsynonymous” refers to muta ence Sequence, and the positions of the mitochondrial tions that result in changes to the encoded amino acid. AS genome referred to herein are assigned relative to that used herein, “synonymous” refers to mutations that do not Sequence. AS used herein, “loci' refers to more than one result in changes to the encoded amino acids. locus. AS used herein, “nucleotide allele” refers to a single nucleotide at a Selected nucleotide locus from a Selected 0046. As used herein, “haplogroup” refers to radiating Sequence when different bases occur naturally at that locus lineages on the human evolutionary tree, as is known in the in different individuals. The nucleotide allele information is art. AS used herein, "macro-haplogroup' refers to a group of provided herein as the nucleotide locus number and the base evolutionarily related haplogroups. AS used herein, "Sub that is at that locus, such as 3796C, which means that at haplogroup' refers to an evolutionarily related Subset of a human mitochondrial position 3796 in the Cambridge haplogroup. An individual's haplotype is the haplogroup to Sequence, there is a cytosine (C). AS used herein, "amino which he belongs. acid allele' refers to the amino acid that is at a Selected 0047 As used herein, “extended longevity” or “extended amino acid location in the human mitochondrial genome lifespan refers to living longer than the average expected when different amino acids occur naturally at that location in lifespan for the population to which one belongs. AS used different individuals. There are thirteen protein-coding herein, "centenaria' refers to an extended lifespan that is at genes in the human mitochondria. For each gene, the least 100 years. encoded protein consists of amino acids that are numbered starting at one. ND1304H, means that there is a histidine at 0.048 AS used herein, “abnormal energy metabolism” in amino acid 304 in the ND1 protein. Amino acids are an individual who is non-native to the geographical region encoded by codons. As used herein, “codon” refers to the in which he lives refers to energy metabolism that differs group of three nucleotides that encode an amino acid in a from that of the population that is native to where the protein, as is known in the art. An amino acid allele can be individual lives. AS used herein, "abnormal temperature referred to by one or more of the nucleotide loci that code regulation' in Such an individual refers to temperature for it. For example, ntl 15884P means that there is a proline regulation that differs from that of the population that is (P) encoded by the codon containing nucleotide locus native to where he lives. AS used herein, "abnormal oxida 15884. US 2005/O123913 A1 Jun. 9, 2005

0051 AS used herein, “evolutionarily significant gene” two. Columns one and three of Table 4 make up the set of refers to a gene that has Statistically significantly more non-Cambridge human mtDNA nucleotide alleles in 48 nonsynonymous nucleotide changes, when compared to the genomeS. corresponding gene in another individual, than would be expected by chance. AS used herein, "evolutionarily signifi 0056. The nucleotide alleles listed in Table 3, including cant nucleotide allele” refers to a nucleotide allele that is the Cambridge nucleotide alleles, being naturally occurring, located in a gene that has been determined to be evolution are useful for identifying alleles that are associated with arily Significant using that nucleotide allele, or an equivalent abnormal physiological conditions. These nucleotide alleles nucleotide allele in a corresponding gene in another indi can be ignored during analysis Steps when performing vidual. AS used herein, “intraspecific' means within one methods for identifying novel alleles associated with Species. AS used herein, “Subpopulation” refers to a popu Selected physiological conditions. lation within a larger population. A Subpopulation can be as 0057. As described below, certain alleles of Table 3 are Small as one individual. AS used herein, “geographic region' useful for identifying physiological conditions related to refers to a geographic area in which a Statistically significant energy metabolism Such as energetic imbalance, metabolic number of individuals have the same haplotype. AS used disease, abnormal energy metabolism, abnormal tempera herein, being "native' to a geographic region refers to ture regulation, abnormal oxidative phosphorylation, abnor having the haplotype associated with that geographic region. mal electron transport, obesity, amount of body fat, diabetes, The haplotype associated with a geographic region is that hypertension, and cardiovascular disease when the affected which originated in the region or of many individuals who individuals have the abnormal physiological condition Settled historically in the region with respect to human because they are in a geographical region that is not native evolution. for their haplogroup. 0.052 AS used herein, “target” or “target sample” refers to 0.058. The nucleotide alleles listed in Table 3, including the collection of nucleic acids used as a Sample for array the Cambridge nucleotide alleles, are also useful for iden analysis. The target is interrogated by the probes of the array. tifying mtDNA sequences associated with and diagnostic of A “target' or “target Sample' may be a mixture of Several human haplogroups. Example 2 Summarizes phylogenetic Samples that are combined. For example, an experimental analyses of the Sequence data of the 103 individuals and the target sample may be combined with a differently labeled Cambridge Sequence along with two chimpanzee mtDNA control target Sample and hybridized to an array, the com Sequences. The results are shown in FIG. 1 in a cladogram. bined Samples being referred to as the "target' interrogated Calculations of the time Since the most recent common by the probes of the array during that experiment. AS used ancestor (MRCA) are shown in Table 5. The 104 individuals herein, “interrogated” means tested. Probes, targets, and were chosen from known haplogroups, and the correspond hybridization conditions are chosen Such that the probes are ing haplogroups are labeled on the figure. Combining the capable of interrogating the target, i.e., of hybridizing to sequence data of the 104 individuals with FIG. 1 and the complementary Sequences in the target Sample. geographic regions native to human haplogroups, as is 0.053 As used herein, “increased likelihood of develop known in the art, results in FIG.2 (Example 3), which tracks ing blindness” refers to a higher than normal probability of human mtDNA migrations. Analysis of several mtDNA losing the ability to see normally and/or of losing the ability genomic Sequences representing each haplogroup demon to See normally at a younger age. Strated which alleles are Segregating within a haplogroup as well as which alleles are present in every individual within 0.054 All sequences defined herein are meant to encom one or more haplogroups. The alleles that are present in pass the complementary Strand as well as double-Stranded every individual within each haplogroup are shown in FIG. polynucleotides comprising the given Sequence. 3 (Example 4). On the left, Sub-haplogroups and haplo groups are listed. Macrohaplogroups are shown in paren 0055. This invention provides a list of human mtDNA theses. Nucleotide loci and alleles that are present in all the polymorphisms found in all the major human haplogroups. members of each group (Sub-haplo or haplo) are listed. A Example 1 Summarizes data from Sequencing over 100 Vertical bar designates that all of the alleles to the right are human mtDNA genomes that are representative of the major present in all of the haplogroups and/or Sub-haplogroups to human haplogroups around the World. The Summary the left. FIG. 3 is drawn as a cladogram. For example, FIG. includes over 900 point mutations and one nine-base pair 3 demonstrates that the macrohaplogroup (R) individuals all deletion. Table 3, Human MtDNA Nucleotide Alleles, lists contain 12705C and 16223C, and no other individuals are the alleles identified in 103 such sequences in the third known to have these alleles, therefore macro-haplogroup (R) column, the corresponding alleles of the Cambridge mtDNA can be diagnosed by identifying in a Sample containing Sequence in the Second column and the nucleotide loci mtDNA, the presence of either 12705C or 16223C. Simi (position in the Cambridge sequence), in the first column. larly, macro-haplogroup (N) can be diagnosed by identifying Table 3 lists the set of human mtDNA nucleotide alleles that occur naturally in different haplogroups. Table 3 does not the presence of 8701A, 9540T, or 10873T. include alleles previously known to be associated with 0059 Analysis of the data in FIG.3 demonstrated sets of disease (i.e., does not include the alleles of Table 1). The alleles useful for diagnosing the haplogroups (Example 5). nucleotide alleles listed in column three of Table 3, together These alleles are listed by haplogroup in Tables 6 and 7, and with the corresponding nucleotide loci in column one, make by sub-haplogroup in Tables 8 and 9. A set of alleles useful up the set of non-Cambridge human mtDNA nucleotide for diagnosing all of the haplogroups and Sub-haplogroups alleles. Table 4 lists the nucleotide alleles identified by the in FIG. 3 is listed in Table 10. Table 10 lists the nucleotide inventors hereof in 48 human mtDNA genomes in column loci in column one and the nucleotide alleles useful for three, and the corresponding Cambridge alleles in column diagnosing haplogroups in column two. Table 10 contains US 2005/O123913 A1 Jun. 9, 2005

Some alleles from the Cambridge Sequence. There are many 0063 Previously in the art, it has been thought that equivalent methods for diagnosing the haplogroups. Meth polymorphisms in human mtDNA, Such as the nucleotide ods for diagnosing haplogroups that require testing only one alleles listed in Table 3, were neutral in all contexts and or a few loci are listed in Example 5. The presence of only could not be associated with physiological conditions. It has one particular allele is usually Sufficient for diagnosing a been thought that differences in human mtDNA diversity haplogroup, however, often it is not known which locus asSociated with inter-continental migrations were due to needs to be tested. By determining the allele at each nucle random genetic drift (e.g. founder effects followed by rapid otide locus listed in Table 10, the haplogroup of an unknown population expansion). In this invention, the biological and Sample can be diagnosed. Alternatively, macro-haplogroups clinical significance of these human mtDNA polymorphisms can be diagnosed or excluded first, thereby decreasing the are disclosed. The neutrality of the nucleotide alleles listed number of loci that need to be tested to distinguish between in Table 3 was tested using neutrality analysis (Examples the remaining, possible haplogroups. Alleles useful for diag 9-12). nosing macro-haplogroups by methods that require testing 0064. Some of the nucleotide loci in Table 3 are located only one or a few loci are included in Table 11. Further in the mitochondrial protein-coding genes (Table 2). Of analysis of the data provided by this invention will demon those loci, Some of the identified nucleotide alleles alter the strate which sets of alleles identify additional sub-haplo protein encoded by the codon in which the nucleotide locus groupS and additional macro-haplogroups. resides. This is determined using the mitochondrial codon usage table, as is known in the art. Nucleotide alleles that 0060 Diagnosing the haplogroup of a sample is useful in change an amino acid are called missense mutations, mis criminal investigations and forensic analyses. Identifying a Sense polymorphisms, or nonsynomymous differences. Mis Sample as belonging to a particular haplogroup, and know Sense polymorphisms alter the protein Sequence relative to a ing which alleles have not been associated with a Selected compared Sequence, but they still may be neutral because physiological condition and context, are useful when iden they do not affect the function of the encoded protein. tifying novel alleles associated with a Selected physiological Without performing biochemical studies on the affected condition, as described above and in Example 6. Diagnosing proteins, Statistical analyses can be performed to determine the haplogroup of a Sample is also useful for identifying a whether a polymorphism is neutral, whether evolution novel allele associated with a Selected physiological condi imposed Selection on the encoding allele, and whether that tion when the novel allele causes the physiological condition Selection is positive. This invention provides results of the only in the genetic context of a particular haplogroup, as statistical analyses of the polymorphisms in Table 3 and shown in Example 6. In example 6, the list of alleles provides a list of which alleles are not neutral, and therefore asSociated with haplogroups found in Russia was used in the evolutionarily Significant. sequence analysis of two Russian LHON families. By elimi nating alleles listed in Table 3, two novel mutations were 0065 Neutrality testing of nucleotide alleles first requires identified that are associated with LHON. These new com neutrality testing of the genes containing those nucleotide plex I mutations, 3635A and 4640C, are useful for diagnos alleles. Neutrality testing of one or more genes by compar ing a predisposition to Leber Hereditary Optic Neuropathy ing two Sets of allelic genes from two intraspecific popula (LHON). tions was performed, as described in Example 9. Haplo groups were combined to make populations for the 0061 Example 7 demonstrates the identification of a new comparison. In example 9, nucleotide alleles from the entire primary LHON mutation, 10663C, in complex I, that coding region of the mtDNA genome, representing haplo appears to cause a predisposition to LHON only when groupS native to a geographic region, were combined to asSociated with haplogroup J. Haplogroup J is defined by a make a first population and first Set of Sequences. Nucleotide nonsynonymous difference that is useful for diagnosing alleles of the entire coding region of the mtDNA genome, haplogroup J, 458T in ND5. This invention provides a from haplogroupS native to a different geographic region, method of diagnosing a person with a predisposition to were combined to make the Second population and the LHON and/or to developing early onset blindness by iden Second Set of Sequences. Nucleotide alleles were divided tifying, in a Sample containing mtDNA from the perSon, the into those encoding Synonymous and non-Synonymous dif nucleotide allele, or a synonymous nucleotide allele of ferences. The ratio of K/K for each gene, Separated by the 10663C and also identifying alleles diagnostic of haplo population containing the allele, is shown in Table 12. group J, such as 458T in ND5. Because ND5458T is a Neutrality testing of genes by comparing one Set of at least missense mutation in all haplogroup J individuals, this two nucleotide alleles of at least one gene from one popu particular mutation may be directly involved in causing lation of one Species was performed in Example 10. In LHON. ND1 304H is another missense mutation that is Example 10, Sequences of the entire coding region of the present in all haplogroup J individuals, and may also be mtDNA genome, of haplogroups in all geographic regions directly involved in causing LHON. 458T is also present in on earth, were combined to make one population and Set of haplogroup T individuals. Haplogroup J is also associated sequences for analysis. FIG. 4 shows the results of the with a predisposition to centenaria and an extended lifespan. comparison of one Set of Sequences from one population of ND5458T and ND1304H may also be directly involved in only one Species, 104 human Sequences. Example 11 causing the predisposition to centenaria and extended includes comparisons of Sets of Sequences between two lifespan. populations, human VS. P. paniscus, human VS. P. troglo 0062) Example 8 demonstrates the importance of demo dytes, human VS. eight other primate species, and human VS. graphic factors in intercontinental mtDNA sequence radia thirteen mammalian Species. tion. Haplogroups are combined and Separated into various 0066. To identify an evolutionarily significant gene, two populations for Statistical analyses. Sets of nucleotide Sequences, each Set from a different US 2005/O123913 A1 Jun. 9, 2005 population, are compared to each other. Nucleotide gene, using one or two populations, are performed with the Sequences representing parts of genes or one or more whole addition of a step of analyzing the Sequence data Set to genes are useful. The Sets of Sequences are compared to each determine an evolutionarily Significant nucleotide allele. An other by neutrality analysis. Differences in the Sequences evolutionarily significant nucleotide allele is part of a from each Set are determined to be Synonymous or nonsyn Sequence incoding an allelic amino acid in an evolutionarily onymous differences. The proportion of nonsynonymous Significant gene or part of a gene. Examples 13 and 14 differences is compared to the proportion of Synonymous demonstrate identification of evolutionary significant nucle differences (K/K). The results of the analysis are compiled otide alleles and evolutionarily Significant amino acid alleles in a data Set and the data Set is analyzed, as is known in the in the evolutionarily significant genes identified in Examples art, to identify one or more evolutionarily significant genes. 9-12. Evolutionarily significant amino acid alleles are the When the nonsynonymous differences occur Significantly amino acids encoded by the codons containing evolution more often than is expected by chance than the Synonymous arily Significant nucleotide alleles. In these examples, nucle differences, the gene or part of the gene is determined to be otides at loci not listed in Table 3 are identical to the evolutionarily significant. When the synonymous differ Cambridge Sequence So that the entire codon containing an ences occur Significantly more often than is expected by evolutionarily significant nucleotide allele and the amino chance than the nonsynonymous differences, the gene or part acid encoded by that codon can be determined. All nucle of the gene is determined to be conserved. When the ratio is otide alleles that are part of a codon encoding the same as expected by chance, then there is no evidence of Selection amino acid as an evolutionarily significant amino acid allele or evolutionary Significance. identified herein, or identified by methods of this invention, 0067. To identify an evolutionarily significant gene, only are also evolutionarily significant and are intended to be one set of nucleotide Sequences (from only one population) within the Scope of this invention. An evolutionarily signifi may also be analyzed, e.g., the nucleotide Sequences repre cant amino acid allele may include more than one nucleotide Sentative of humans living on one continent. When only one allele, Such as at two neighboring nucleotide loci. Evolu Set of Sequences is analyzed, the Set must contain at least two tionarily significant nucleotide alleles and evolutionarily corresponding nucleotide alleles (i.e., there must be Significant amino acid alleles in human mitochondrial Sequence polymorphism). Corresponding sequences are Sequences, identified by the methods of this invention, are Sequences of the same gene or gene part from at least two listed in Table 14. In column one, Table 14 lists the gene individuals. The sequences from different individuals within containing the alleles, column two indicates the locus of the the population must contain polymorphisms with respect to nucleotide allele, column three lists the Cambridge nucle each other. Differences in the Sequences relative to each otide allele at that nucleotide locus, column four lists a other are determined to be Synonymous or nonsynonymous. non-Cambridge allele of this invention, column five lists the Neutrality analysis is performed to generate a data Set. The amino acid encoded by the codon containing the Cambridge data Set is analyzed to identify an evolutionarily significant nucleotide allele (when other Cambridge nucleotides are gene. If an analysis determines that none of the analyzed present at the other nucleotide loci of the codon), and genes are evolutionarily significant, the Set of nucleotide column six lists the amino acid encoded by the codon Sequences can be increased, Such as by increasing the size of containing the non-Cambridge allele (when Cambridge the population from which the Sequences are derived, to nucleotides are present at the other nucleotide loci of the determine if one or more genes are evolutionarily significant codon). Columns two, three, and four make the set of in the enlarged population. evolutionarily significant human mitochondrial nucleotide alleles. Columns two, five, and Six make the Set of evolu 0068 Example 12 is similar to example 9 except that the tionarily Significant human mitochondrial amino acid alle data is further analyzed by manipulating K/K to K. les. Table 14 designates the nucleotide locus of the listed Examples 9-12 demonstrate that all but one mtDNA gene are alleles. For the amino acid alleles listed in columns five and not neutral and therefore are evolutionarily significant. Six, the relevant loci are all three nucleotide loci in the Genes are determined to not be neutral by Statistical Signifi encoding codon containing the nucleotide locus listed in cance tests known in the art. Some genes are only evolu column two. tionarily Significant when comparing Selected populations. 0070 To identify an evolutionarily significant amino acid For example, ND4 was demonstrated to be significant when allele, the Steps for identifying an evolutionarily significant comparing Native American Sequences to African Sequences gene, using one or two populations, are performed with the and when comparing all human Sequences to each other, but addition of two steps: 1) analyzing the data set to determine not when comparing European to African Sequences. ND4L an evolutionarily significant nucleotide allele; and 2) deter is the only mtDNA gene not shown to be evolutionarily mining the encoded amino acid allele. An evolutionarily Significant by the current analyses. ND4L might be demon Significant amino acid allele is a different amino acid, strated to be evolutionarily significant by the methods of this representing a nonsynonymous difference, relative to the invention using one or more different populations or using corresponding amino acid allele against which it was com only part of the gene Sequence. In examples 9-12, the entire pared, wherein the gene has been determined to be evolu Sequence of each gene was used for analysis, however tionarily significant in the corresponding one or more popu portions of genes are also useful in the methods of this lations. invention. The Statistical significance tests prevent too small 0071. In this invention it is demonstrated that amino acid a gene portion from being used to determine non-neutrality. Substitution mutations (nonsynonymous differences) are 0069. After identifying evolutionarily significant genes, much more common in human mtDNAS than would be evolutionarily significant nucleotide alleles can be identi expected by chance, and that most of them are evolutionarily fied. To identify an evolutionarily significant nucleotide Significant. This invention demonstrates that these alleles allele, the Steps for identifying an evolutionarily significant have become fixed by Selection. The mitochondrial genes US 2005/O123913 A1 Jun. 9, 2005 encode proteins that are responsible for generating energy temperature regulation, abnormal oxidative phosphoryla and for generating heat to maintain body temperature. AS tion, abnormal electron transport, obesity, amount of body humans migrated to different parts of the World, they fat, diabetes, hypertension, and cardiovascular disease. The encountered changes in diet and climate. The high mutation method involves testing a Sample containing mitochondrial rate of mtDNA and the central role of mitochondrial proteins nucleic acid from an individual in a geographic region to in cellular energetics make the mtDNA an ideal System for determine the haplogroup of the Sample and therefore of the permitting rapid mammalian adaptation to varying climatic individual, comparing the haplogroup of the individual to and dietary conditions. The increased amino acid Sequence the Set of haplogroups known to be native to that geographic variability that has been found among human mtDNA genes region, and diagnosing the individual human with a predis is due to the fact that natural selection favored mtDNA position to the above-mentioned conditions if the haplo alleles that altered the coupling efficiency between the group of the individual is not in the Set of haplogroupS native electron transport chain (ETC) and ATP synthesis, deter to that geographic region. This invention enables treatment mined by the mitochondrial inner membrane proton gradient of one of the above-mentioned conditions that is diagnosed (AT). The coupling efficiency between the ETC and ATP by the above-mentioned method, comprising relocating the Synthesis is mediated to a considerable extent by the proton diagnosed human to a geographic region that is of Similar channel of the ATP synthase, which is composed of the climate as the region(s) native to the human's haplogroup mtDNA-encoded ATP6 protein and the nuclear DNA-en and/or changing the diagnosed human's diet to more closely coded ATP9 protein. Mutations in the ATP6 gene, which match the diet historically available in the region(s) native to create a more leaky ATP synthase proton channel, reduced the humans haplogroup. ATP production but increased heat production for each calorie consumed. Such a change in energy balance was 0073. The above-described method for diagnosing a pre beneficial in a temperate or arctic climate, but deleterious in disposition to a physiological condition is also useful for a tropical climate. Humans acquiring mtDNA alleles asSociating an amino acid allele with the physiological enabling better adaptation to the encountered changes in diet condition The evolutionarily significant amino acid alleles and climate experienced a higher genetic fitneSS and those present in the haplogroup of the diagnosed individual and alleles were Selected for. In particular, these alleles were not in the haplogroups native to the individual's geographic established genetically because they had an adaptive advan location are associated with the physiological condition by tage as humans moved from the African tropics into the the methods of this invention. Amino acid alleles, and the EurAsian temperate Zone and on into the arctic (FIG. 2). corresponding nucleotide alleles, useful for diagnosing hap The lack of recombination of the maternally inherited mtD logroups, and the haplogroup they are useful for diagnosing, NAS favored the rapid Segregation, expression and adaptive are listed in Table 15. The amino acid alleles and corre Selection of advantageous mtDNA alleles. The apparent sponding nucleotide alleles listed in Table 15, and Synony non-randomneSS of the differences in non-synonymous ver mously coding nucleotide alleles, are associated with the SuS Synonymous mtDNA variation between continents dem above-mentioned physiological conditions. Table 15 lists the onstrates that Selection also influenced inter-continental Set of amino acid alleles useful for diagnosing haplogroups. colonization. Random genetic hitchhiking, Such as in the Column one of Table 15 lists the gene, column two lists the Synonymous alleles, then resulted in identifiable continent nucleotide locus, column three lists the useful nucleotide Specific haplogroups. allele, column four lists the useful amino acid allele encoded by the useful nucleotide allele when Cambridge nucleotides 0072 Modern mtDNA variation has been shaped by are present at the other nucleotide loci of the encoding adaptation as our ancestorS moved into different environ codon, and column five lists the haplogroups or Sub-haplo mental conditions. Variants that are advantageous in one groups, in parentheses, that contain the corresponding alle climatic and dietary environment are maladaptive when les. The amino acid alleles (column four) can be identified individuals locate to a different environment. The methods by the codon containing the nucleotide locus (column two). of this invention associate mtDNA nucleotide alleles with For example, the proline in the ND1 gene is identified as ntl haplogroups and combine this data with native haplogroup 3796 P, where ntl signifies the codon containing the nucle geographic regions as is known in the art, to diagnose otide locus (ntl) 3796. When an individual of one of the individuals as having predispositions to late-onset clinical haplogroups listed in column five of Table 15 is diagnosed disorderS Such as obesity, diabetes, hypertension, and car with one of the above-mentioned physiological conditions diovascular disease when those individuals live in climatic by the above-mentioned method, the physiological condition and dietary environments that are disadvantageous with is associated with the presence of one of the alleles listed in respect to their mtDNA alleles. When humans having Table 15. When the haplogroup of the individual is haplo regional mtDNA alleles move into a different thermal and/or group G, the amino acid allele likely to have caused the dietary environment from the one in which the alleles were physiological condition is ntl 4833 A. When the haplogroup Selected, they are energetically imbalanced with their envi of the individual is haplogroup T, the amino acid allele is ronment, and as a result are predisposed to having metabolic selected from the group consisting of ntl 14917D, ntl 8701 diseases Such as diabetes, hypertension, cardiovascular dis T, and ntl 15452 I. When the haplogroup is haplogroup W, ease, and other diseases known to the art to be associated the amino acid allele is Selected from the group consisting of with metabolism and mitochondrial functions. The above int15046 I, int15460 T, ntl 8701 T, and ntl 15884 P. When the mentioned late-onset clinical disorders are rapidly becoming haplogroup is haplogroup D, the amino acid allele is Selected epidemic around the World in members of our globally from the group consisting of ntl 5178 M and ntl 8414 F. mobile Society. This invention provides a method of diag When the haplogroup is haplogroup L0, the amino acid nosing a human with a predisposition to a physiological allele is Selected from the group consisting of ntl 5442 L, ntl condition Such as, but not limited to, energetic imbalance, 7146 A, int19402 P. ntl 13105 V, and int1 13276 V. When the metabolic disease, abnormal energy metabolism, abnormal haplogroup is haplogroup L1, the amino acid allele is US 2005/O123913 A1 Jun. 9, 2005

selected from the group consisting of ntl 7146 A, ntl 7389H, tion are also useful for identifying evolutionarily significant ntil 13105 V, int1 13789H, and ntl 14178 V. When the protein-coding genes and the corresponding alleles in many haplogroup is haplogroup C the amino acid allele is Selected Species. For example, the methods of this invention are from the group consisting of ntl 8584 T and ntl 14318 S. applicable to varieties of beef or dairy cattle, or pig lines. When the haplogroup is Selected from the group consisting Corn lines are divisible by phenotypic and/or molecular of haplogroups A, I, X, B, F, Y, and U the amino acid allele markers into heterotic groups that are useful populations in is ntl 8701 T. When the haplogroup is haplogroup J the the methods of this invention. Using corn heterotic groups as amino acid allele is Selected from the group consisting of ntl populations, the methods of this invention are useful for 8701 T, ntl 13708 T, and ntl 15452 I. When the haplogroup identifying evolutionarily significant protein-coding genes is haplogroup Selected from the group consisting of haplo and the corresponding mutations in the nuclear, chloroplast, groupS V and H, the amino acid allele is Selected from the and mitochondrial genomes of corn. group consisting of ntl 8701 T and ntl 14766 T. 0078. This invention provides isolated nucleic acid mol 0.074 Evolutionarily significant nucleotide and amino ecules containing novel nucleotide alleles of this invention acid alleles also exist in nuclear-encoded ATP9 that are in libraries. The libraries contain at least two Such mol useful for diagnosing predisposition to an energy metabo ecules. Preferably the molecules have unique Sequences. lism-related physiological condition Such as energetic The molecules typically have a length from about 7 to about imbalance, metabolic disease, abnormal energy metabolism, 30 nucleotides. “About as used herein means within about abnormal temperature regulation, abnormal oxidative phos 10% (e.g., “about 30 nucleotides” means 27-33 nucleotides). phorylation, abnormal electron transport, obesity, cente However, the molecules may be longer, Such as about 50 naria, diabetes, hypertension, and cardiovascular disease. nucleotides long. A library of this invention contains at least two isolated nucleic acid molecules each containing at least These alleles may be identified by methods of this invention. one non-Cambridge nucleotide allele of this invention. A 0075. The evolutionarily significant amino acid alleles library of this invention may contain at least ten, twenty and corresponding nucleotide alleles are candidates for five, fifty, 100, 500 or more isolated nucleic acid molecules, alleles causing a physiological condition for which a pre at least one of which contains a nucleotide allele of this disposition is diagnosable by the methods of this invention. invention. A library of this invention may contain molecules The evolutionarily Significant amino acid and nucleotide having at least two to all of the nucleotide alleles of this alleles identified by the methods of this invention (Table 19) invention, including Synonymous codings of evolutionarily are useful for gene therapy and mitochondrial replacement Significant amino acid alleles. The nucleotide alleles of this therapy to treat the corresponding physiological conditions. invention are defined by a nucleotide locus, the nucleotide The evolutionarily Significant genes, amino acid alleles, and location in the human mitochondrial genome, and by the A nucleotide alleles identified by the methods of this invention GCT (or U) nucleotide. An isolated nucleic acid molecule, are useful for identifying targets for traditional therapy, and in a library of this invention, can be identified as containing for designing corresponding therapeutic agents. The evolu a nucleotide allele of this invention, because the nucleotide tionarily Significant genes and amino acid and nucleotide allele of this invention is bounded on at least one side by its changes identified by the methods of this invention are context in the mitochondrial genome. Statistically, to be useful for generating animal models of the corresponding unique in the human mitochondrial genome, Such a mol human physiological conditions. ecule would need to be at least about Seven nucleotides long. Statistically, to be unique in the total human genome, 0.076 AS is known to the art, individuals may contain including the mitochondrial genome, Such a molecule would more than one mitochondrial DNA allele at any given need to be at least about fifteen nucleotides long. Examples nucleotide locus. One cell contains many mitochondria, and of isolated nucleic acid molecules of this invention are one cell or different cells within one organism may contain molecules containing the following nucleotide alleles: 1) genetically different mitochondria. Heteroplasmy is the Cambridge alleles at human mtDNA nucleotide loci 168 occurrence of more than one type of mitochondria in an 170, non-Cambridge alleles at locus 171A, and Cambridge individual or Sample. Varying degrees of heteroplasmy are alleles at human mtDNA nucleotide loci 172-174; and 2) asSociated with varying degrees of the physiological condi Cambridge alleles at 11940-11946, non-Cambridge alleles at tions described herein. Heteroplasmy may be identified by 11947G, and Cambridge alleles at 11948-11954. An isolated means known to the art, and the Severity of the physiological nucleic acid molecule of this invention may contain more condition associated with Specific nucleotide alleles is than one nucleotide allele of this invention. The nucleotide expected to vary with the percentage of Such associated allele of this invention may be at any position in the isolated alleles within the individual. nucleic acid molecule. Often it is useful to have the relevant 0077. The methods of this invention are used to analyze nucleotide allele in the center of the isolated nucleic acid the human mitochondrial genome in the listed examples, but molecule or on the 3' end of the molecule. Isolated nucleic the methods are also useful for analyzing other genomes and acid molecules of this invention are useful for interrogating, other species. The methods of this invention are useful for determining the presence or absence of, a nucleotide allele identifying evolutionarily significant protein-coding genes at the corresponding nucleotide locus in the mitochondrial and the correspondingly encoded mutations in other genome in a Sample containing mitochondrial nucleic acid genomes in addition to mitochondrial genomes, Such as in from a human, using any method known in the art. Methods nuclear and chloroplast genomes. Using human haplogroups for determining the presence of absence of the nucleotide as populations (FIG. 1), the methods of this invention are allele include allele-specific PCR and nucleic acid array useful for identifying evolutionarily Significant protein-cod hybridization or Sequencing. ing genes and the corresponding evolutionarily significant 007.9 The alleles and libraries of this invention are useful alleles in human nuclear genes. The methods of this inven for designing probes for nucleic acid arrayS. This invention US 2005/O123913 A1 Jun. 9, 2005 provides nucleic acid arrays having two or more nucleic acid acid Sticks to, or hybridizes, with the probes on the array molecules or spots (each spot comprising a plurality of when the probe is sufficiently complementary to the labeled, Substantially identical isolated nucleic acid molecules), each amplified, Sample nucleic acid. The extra nucleic acid is molecule having the Sequence of an allele of this invention. washed off of the array, leaving behind only the nucleic acid The molecules on the arrays of this invention are usually that has bound to the probes. By obtaining an image of the about 7 to about 30 nucleotides long. The arrays are useful array with a fluorescent Scanner and using Software to for detecting the presence or absence of alleles. Arrays of analyze the hybridized array image, it can be determined if, this invention are also useful for Sequencing human mtDNA. and to what extent, genes are Switched on and off, or whether Alleles may be selected from sets of nucleotide alleles or not Sequences are present, by comparing fluorescent including human mtDNA nucleotide alleles, non-Cambridge intensities at Specific locations on the array. The intensity of human mtDNA nucleotide alleles, human mtDNA nucle the Signal indicates to what extent a Sequence is present. In otide alleles in 48 genomes and the Cambridge Sequence, expression arrays, high fluorescent Signals indicate that non-Cambridge human mtDNA nucleotide alleles in 48 many copies of a gene are present in a Sample, and lower genomes, nucleotide alleles useful for diagnosing human fluorescent signal shows a gene is less active. By Selecting haplogroups and macro-haplogroups, nucleotide alleles use appropriate hybridization conditions and probes, this tech ful for diagnosing human haplogroups, and evolutionarily nique is useful for detecting Single nucleotide polymor Significant human mitochondrial nucleotide alleles as listed phisms (SNPs) and for Sequencing. Methods of designing in the various Tables and portions of tables hereof. Arrays of and using microarrays are continuously being improved this invention may contain molecules capable of interrogat (Relogio, A. et al. (2002) Nuc. Acids. Res. 30(11): e51; ing all of the alleles in one of the above-mentioned Sets of Iwasaki, Het al. (2002) DNA Res. 9(2):59-62; and Lindroos, alleles. A genotyping array useful for detecting Sequence K. et al. (2002) Nuc. Acids. Res. 30(14):E70). polymorphisms, Such as are provided by this invention, are Similar to Affymetrix (Santa Clara, Calif., USA) genotyping 0084 Arrays of this invention may be made by any array arrays containing a Perfect Match probe (PM) and a corre Synthesis methods known in the art Such as Spotting tech sponding Mismatch probe (MM). A PM probe could com nology or Solid phase Synthesis. Preferably the arrays of this prise a non-Cambridge allele at a Selected nucleotide locus invention are Synthesized by Solid phase Synthesis using a and the corresponding MM probe could comprise the cor combination of photolithography and combinatorial chem responding Cambridge allele at the Selected nucleotide istry. Some of the key elements of probe Selection and array locus. Arrays of this invention include Sequencing arrays for design are common to the production of all arrayS. Strategies human mtDNA. to optimize probe hybridization, for example, are invariably included in the process of probe selection. Hybridization 0080. As used herein, “array” refers to an ordered set of under particular pH, Salt, and temperature conditions can be isolated nucleic acid molecules or spots consisting of plu optimized by taking into account melting temperatures and ralities of Substantially identical isolated nucleic acid mol by using empirical rules that correlate with desired hybrid ecules. Preferably the molecules are attached to a Substrate. ization behaviors. Computer models may be used for pre The spots or molecules are ordered So that the location of dicting the intensity and concentration-dependence of probe each (on the Substrate) is known and the identity of each is hybridization. known. Arrays on a microScale can be called microarrayS. 0085) Detecting a particular polymorphism can be MicroarayS on Solid SubStrates, Such as glass or other accomplished using two probes. One probe is designed to be ceramic Slides, can be called gene chips or chips. perfectly complementary to a target Sequence, and a partner 0.081 Arrays are preferably printed on solid substrates. probe is generated that is identical except for a single base Before printing, Substrates Such as glass Slides are prepared mismatch in its center. In the Affymetrix System, these probe to provide a Surface useful for binding, as is known to the art. pairs are called the Perfect Match probe (PM) and the ArrayS may be printed using any printing techniques and Mismatch probe (MM). They allow for the quantitation and machines known in the art. Printing involves placing the Subtraction of Signals caused by non-specific cross-hybrid probes on the Substrate, attaching the probes to the Substrate, ization. The difference in hybridization signals between the and blocking the Substrate to prevent non-specific hybrid partners, as well as their intensity ratios, Serve as indicators ization Spots are printed at known locations. ArrayS may be of Specific target abundance, and consequently of the printed on glass microscope Slides. Alternatively, probes Sequence. may be Synthesized in known positions on prepared Solid 0086 Arrays can rely on multiple probes to interrogate substrates (Affymetrix, Santa Clara, Calif., USA). individual nucleotides in a Sequence. The identity of a target 0082 Arrays of this invention may contain as few as two base can be deduced using four identical probes that vary spots, or more than about ten spots, more than about only in the target position, each containing one of the four twenty-five spots, more than about one hundred Spots, more possible bases. Alternatively, the presence of a consensus than about 1000 spots, more than about 65,000 spots, or up Sequence can be tested using one or two probes representing to about Several hundred thousand Spots. Specific alleles. To genotype heterozygous or genetically mixed Samples, arrays with many probes can be created to 0.083 Using microarrays may require amplification of provide redundant information, resulting in unequivocal target Sequences (generation of multiple copies of the same genotyping. Sequence) of Sequences of interest, Such as by PCR or reverse transcription. AS the nucleic acid is copied, it is 0087 Probes fixed on solid substrates and targets (nucle tagged with a fluorescent label that emits light like a light otide Sequences in the sample) are combined in a hybrid bulb. The labeled nucleic acid is introduced to the microar ization buffer Solution and held at an appropriate tempera ray and allowed to react for a period of time. This nucleic ture until annealing occurs. Thereafter, the Substrate is US 2005/O123913 A1 Jun. 9, 2005

washed free of extraneous materials, leaving the nucleic PCR is a repetitive, enzymatic, primed Synthesis of a nucleic acids on the target bound to the fixed probe molecules acid Sequence. This procedure is well known and commonly allowing for detection and quantitation by methods known used by those skilled in this art (see Mullis, U.S. Pat. Nos. in the art Such as by autoradiograph, liquid Scintillation 4,683,195, 4,683.202, and 4,800,159; Saiki et al. 1985) counting, and/or fluorescence. AS improvements are made in Science 230:1350-1354). PCR is used to enzymatically hybridization and detection techniques, they can be readily amplify a DNA fragment of interest that is flanked by two applied by one of ordinary skill in the art. AS is well known oligonucleotide primers that hybridize to opposite Strands of in the art, if the probe molecules and target molecules the target Sequence. The primers are oriented with the 3' ends hybridize by forming a strong non-covalent bond between pointing towards each other. Repeated cycles of heat dena the two molecules, it can be reasonably assumed that the turation of the template, annealing of the primers to their probe and target nucleic acid are essentially identical, or complementary Sequences, and extension of the annealed almost completely complementary if the annealing and primers with a DNA polymerase result in the amplification Washing StepS are carried out under conditions of high of the segment defined by the 5' ends of the PCR primers. Stringency. The detectable label provides a means for deter Since the extension product of each primer can Serve as a mining whether hybridization has occurred. template for the other primer, each cycle essentially doubles the amount of DNA template produced in the previous cycle. 0088. When using oligonucleotides or polynucleotides as This results in the exponential accumulation of the Specific hybridization probes, the probes may be labeled. In arrays of target fragment, up to Several million-fold in a few hours. By this invention, the target may instead be labeled by means using a thermostable DNA polymerase Such as the Taq known to the art. Target may be labeled with radioactive or polymerase, which is isolated from the thermophilic bacte non-radioactive labels. Targets preferably contain fluores rium Thermus aquaticus, the amplification proceSS can be cent labels. completely automated. Other enzymes that can be used are 0089 Various degrees of stringency of hybridization can known to those skilled in the art. be employed. The more Stringent the conditions are, the 0095 Polynucleotide sequences of the present invention greater the complementarity that is required for duplex can be truncated and/or mutated Such that certain of the formation. Stringency can be controlled by temperature, resulting fragments and/or mutants of the original full-length probe concentration, probe length, ionic Strength, time, and Sequence can retain the desired characteristics of the full the like. Hybridization experiments are often conducted length Sequence. A wide variety of restriction enzymes that under moderate to high Stringency conditions by techniques are Suitable for generating fragments from larger nucleic well know in the art, as described, for example in Keller, G. acid molecules are well known. In addition, it is well known H., and M. M. Manak (1987) DNA Probes, Stockton Press, that Bal31 exonuclease can be conveniently used for time New York, N.Y., pp. 169-170, hereby incorporated by ref controlled limited digestion of DNA. See, for example, erence. However, Sequencing arrays typically use lower Maniatis (1982) Molecular Cloning: A Laboratory Manual, hybridization Stringencies, as is known in the art. Cold Spring Harbor Laboratory, New York, pages 135-139, 0090 Moderate to high stringency conditions for hybrid incorporated herein by reference. See also Wei et al. (1983) ization are known to the art. An example of high Stringency J. Biol. Chem. 258:13006-13512. By use of Bal31 exonu conditions for a blot are hybridizing at 68 C. in 5xSSC/5x clease (commonly referred to as "erase-a-base” procedures), Denhardt's solution/0.1% SDS, and washing in 0.2xSSC/ the ordinarily skilled artisan can remove nucleotides from 0.1% SDS at room temperature. An example of conditions of either or both ends of the Subject nucleic acids to generate moderate stringency are hybridizing at 68 C. in 5xSSC/5x a wide spectrum of fragments that are functionally equiva Denhardt's solution/0.1% SDS and washing at 42 C. in lent to the Subject nucleotide Sequences. One of ordinary 3xSSC. The parameters of temperature and Salt concentra skill in the art can, in this manner, generate hundreds of tion can be varied to achieve the desired level of Sequence fragments of controlled, varying lengths from locations all identity between probe and target nucleic acid. See, e.g., along the original molecule. The ordinarily skilled artisan Sambrook et al. (1989) vide infra or Ausubel et al. (1995) can routinely test or Screen the generated fragments for their Current Protocols in Molecular Biology, John Wiley & characteristics and determine the utility of the fragments as Sons, NY, N.Y., for further guidance on hybridization con taught herein. It is also well known that the mutant ditions. Sequences can be easily produced with Site-directed mutagenesis. See, for example, Larionov, O. A. and Niki 0.091 The melting temperature is described by the fol forov, V. G. (1982) Genetika 18(3):349-59; and Shortle, D. lowing formula (Beltz, G. A. et al., 1983 Methods of et al., (1981) Annu. Rev. Gene. 15:265-94, both incorporated Enzymology, R. Wu, L. Grossman and K. Moldave Eds.) herein by reference. The skilled artisan can routinely pro Academic Press, New York 100:266-285). duce deletion-, insertion-, or Substitution-type mutations and 0092. T=81.5oC+16.6 Log Na++0.41(+G+C)-0.61(% identify those resulting mutants that contain the desired formamide)-600/length of duplex in base pairs. characteristics of wild-type Sequences, or fragments thereof. 0096 Percent sequence identity of two nucleic acids may 0.093 Washes can typically be carried out as follows: be determined using the algorithm of Karlin and Altschul twice at room temperature for 15 minutes in 1xSSPE, 0.1% (1990) Proc. Natl. Acad. Sci. USA 87:2264-2268, modified SDS (low stringency wash), and once at TM-20° C. for 15 as in Karlin and Altschul (1993) Proc. Natl. Acad. Sci. USA minutes in 0.2xSSPE, 0.1% SDS (moderate stringency 90:5873-5877. Such an algorithm is incorporated into the wash). NBLAST and XBLAST programs of Altschulet al. (1990) 0094) Nucleic acid useful in this invention can be created J. Mol. Biol. 215:402-410. BLAST nucleotide Searches are by Polymerase Chain Reaction (PCR) amplification. PCR performed with the NBLAST program, score=100, products can be confirmed by agarose gel electrophoresis. Wordlength=12, to obtain nucleotide Sequences with the US 2005/O123913 A1 Jun. 9, 2005 desired percent Sequence identity. To obtain gapped align Storage device may also contain information associating ments for comparison purposes, Gapped BLAST is used as each allele with one or more native geographic regions. A described in Altschulet al. (1997) Nucl. Acids. Res. 25:3389 program Storage device provided by this invention contains 3402. When utilizing BLAST and Gapped BLAST pro input means for inputting the haplogroup of an individual grams, the default parameters of the respective programs and the geographic region of that individual, and contains (NBLAST and XBLAST) are used. See http://www.ncbi.ni information associating alleles with native geographic h.gov. regions, and program Steps for diagnosing the individual with a predisposition to a physiological condition. A Storage 0097 Standard techniques for cloning, DNA isolation, device containing a data Set in machine readable form amplification and purification, for enzymatic reactions provided by this invention may include encoded information involving DNA ligase, DNA polymerase, restriction endo comprising amino acid alleles listed in Table 19, with each nucleases and the like, and various Separation techniques allele associated with a physiological condition in humans. useful herein are those known and commonly employed by those skilled in the art. A number of Standard techniques are 0099] It will be appreciated by those of ordinary skill in described in Sambrook et al. (1989) Molecular Cloning, the art that populations, Subpopulations, organelles, and Second Edition, Cold Spring Harbor Laboratory, Plainview, amino acid and nucleotide Sequence comparison methods, N.Y.; Maniatis et al. (1982) Molecular Cloning, Cold Spring neutrality test methods, nucleotide Sequencing methods, Harbor Laboratory, Plainview, N.Y.; Wu (ed.) (1993) Meth. codons, Samples, Sample collection techniques, Sample Enzymol. 218, Part I; Wu (ed.) (1979) Meth. Enzymol. 68; preparation techniques, probes, probe generation tech Wu et al. (eds.) (1983) Meth. Enzymol. 100 and 101; niques, genes involved in mitochondrial biology, hybridiza Grossman and Moldave (eds.) Meth. Enzymol. 65; Miller tion techniques, array printing techniques, physiological (ed.) (1972) Experiments in Molecular Genetics, Cold conditions, cell lines, mutant Strains, organisms, tissues, Spring Harbor Laboratory, Cold Spring Harbor, N.Y.; Old Solid Substrates, machine-readable Storage devices, program and Primrose (1981) Principles of Gene Manipulation, devices, and methods of data analyses other than those University of California Press, Berkeley; Schleif and Wen Specifically disclosed herein are available in the art and can sink (1982) Practical Methods in Molecular Biology; Glover be employed in the practice of this invention. All art-known (Ed.) (1985) DNA Cloning Vol. I and II, IRL Press, Oxford, functional equivalents are intended to be encompassed UK; Hames and Higgins (Eds.) (1985) Nucleic Acid Hybrid within the scope of this invention. ization, IRL Press, Oxford, UK; Setlow and Hollaender 0100. The following examples are provided for illustra (1979) Genetic Engineering. Principles and Methods, Vols. tive purposes, and are not intended to limit the Scope of the 1-4, Plenum Press, New York; and Ausubel et al. (1992) invention as claimed here. Any variations in the composi Current Protocols in Molecular Biology, Greene/Wiley, tions and methods exemplified that occur to the skilled New York, N.Y. Abbreviations and nomenclature, where artisan are intended to fall within the Scope of the present employed, are deemed Standard in the field and commonly invention. used in professional journals Such as those cited herein. EXAMPLES 0098. This invention provides machine-readable storage devices and program Storage devices having data and meth Example 1 ods for diagnosing haplogroupS and physiological condi 0101 This invention provides human mtDNA polymor tions. One program Storage device provided by this inven phisms found in all the major human haplogroups. Table 3 tion contains the program Steps: a) determining the shows naturally occurring nucleotide alleles identified in the haplogroup of a Sample from an individual using nucleotide complete mtDNA sequences of 103 individuals, as com Sequence data from nucleic acid in the sample; b) associat pared to the mtDNA Cambridge sequence. All nucleotide ing the haplogroup with information identifying the geo Sequences not listed are identical to the Cambridge graphic region of the individual; c) comparing the haplo Sequence. Nucleotide alleles previously known to be asso group and geographic region of the Sample to the Set of ciated with disease conditions, Such as those listed in Table haplogroups native to the geographic region of the indi 1, are not listed in Table 3. Some deletion or rearrangement vidual; and d) diagnosing the individual with a predisposi polymorphisms have also been excluded. All polymor tion to an energy metabolism-related physiological condi tion if the haplogroup of the individual is not within the set phisms listed are nucleotide Substitutions except for a nine of haplogroups native to the geographic region of the adenine nucleotide deletion at positions 8271-8279. individual; all Said program Steps being encoded in machine readable form, and all Said information encoded in machine TABLE 3 readable form. This invention also provides a data Set, Human MtDNA Nucleotide Alleles encoded in machine-readable form, containing nucleotide Oil alleles listed in Table 19, with each allele associated with nucleotide Cambridge Cambridge encoded information identifying a physiological condition locus alleles alleles in humans. These physiological conditions are energy-me tabolism-related conditions including energetic imbalance, 64 C T 72 T C metabolic disease, abnormal energy metabolism, abnormal 73 A. G temperature regulation, abnormal oxidative phosphoryla 89 T C tion, abnormal electron transport, obesity, amount of body 93 A. G fat, diabetes, hypertension, and cardiovascular disease. This US 2005/O123913 A1 Jun. 9, 2005 16

TABLE 3-continued TABLE 3-continued

Human MtDNA Nucleotide Alleles Human MtDNA Nucleotide Alleles

Oil Oil nucleotide Cambridge Cambridge nucleotide Cambridge Cambridge locus alleles alleles locus alleles alleles

95 710 14 721 43 750 46 769 50 825 51 827 52 850 53 921 71 930 8O 961 82 961 83 O18 85 O41 85 O48 86 119 89 189 89 243 94 290 95 382 95 4O6 98 415 99 42O 2OO 438 2O4 442 2O7 503 208 598 210 700 212 703 215 7O6 217 709 225 715 227 719 228 736 235 738 236 78O 247 811 250 888 252 927 263 2OOO 291 2060 295 2092 297 2.245 316 2.245 317 2263 317 2308 32O 2332 325 2352 340 2358 357 238O 373 2416 400 2483 408 2581 418 2639 456 26SO 462 27O6 465 2755 467 2758 471 2768 48O 2789 482 2792 489 2834 493 2836 499 2857 508 2863 593 2885 597 3O10 663 3O83 678 3197 68O 32OO 709 32O2 US 2005/O123913 A1 Jun. 9, 2005 17

TABLE 3-continued TABLE 3-continued

Human MtDNA Nucleotide Alleles Human MtDNA Nucleotide Alleles

Oil Oil nucleotide Cambridge Cambridge nucleotide Cambridge Cambridge locus alleles alleles locus alleles alleles

3204 586 32O6 596 3221 646 3290 655 33O8 688 3.316 695 3372 715 3394 742 3438 767 3450 769 3480 82O 3505 824 3513 833 3516 841 3516 883 3547 907 3549 917 3552 96.O 3552 977 3565 4 994 3594 SOO4 3644 5027 3666 SO36 3693 SO43 3699 SO46 3.720 5063 3756 5096 3796 5108 3796 5147 3796 5153 38O8 5178 3816 5231 3834 5237 3843 5255 3847 5262 3866 5263 3918 5285 3921 53OO 3927 533O 3970 5331 3981 5390 4O25 5393 O4 O 5417 O4 5426 5442 546O 5465 1. 54.71 5492 5495 558O 5581 2 56O1 5603 5606 5633 5655 5711 5773 5811 5814 5821 5826 5843 5951 5984 5987 6O26 6O29 6O45

US 2005/O123913 A1 Jun. 9, 2005 22

TABLE 3-continued TABLE 3-continued

Human MtDNA Nucleotide Alleles Human MtDNA Nucleotide Alleles

Oil Oil nucleotide Cambridge Cambridge nucleotide Cambridge Cambridge OCS alleles alleles OCS alleles alleles

O8 6274 G A. 11 6278 C T 14 6284 A. G 24 6286 C G 26 6287 C T 29 6288 T C 29 629O C T 40 629 C T 44 6292 C T 45 6293 A. G 47 6294 C T 48 6296 C T 53 6298 T C 62 6304 T C 63 6309 A. G 66 631 T C 67 6316 A. G 68 6317 A. T 69 6318 A. T 71 6319 G A. 72 632O C T 75 6324 T C 76 63.25 T C 82 6326 A. G 83 6327 C T 84 6343 A. G 85 6344 C T 86 6354 C T 87 6355 C T 88 6356 T C 88 6357 T C 89 6360 C T 92 6362 T C 93 6366 C T 62O7 6368 T C 6209 6390 G A. 6212 6391 G A. 6213 6399 A. G 6214 6438 G A. 6217 6439 C A. 62.19 6483 G A. 6223 6519 T C 6224 6527 C T 6227 6229 6230 6231 0102 Table 4 lists the nucleotide alleles identified in 48 6232 mitochondrial genomes as compared to the Cambridge 6234 6235 Sequence. 6239 6241 TABLE 4 6242 6243 Human MtDNANucleotide Alleles in 48 Genomes 6245 6247 Oil 6249 nucleotide Cambridge Cambridge 6254 locus alleles alleles 6255 6256 64 C T 6257 72 T C 6258 73 A. G 6260 89 T C 6261 93 A. G 6264 95 A. C 6265 114 C T 6266 146 T C 6268 150 C T 6270 151 C T 6271 152 T C

US 2005/O123913 A1 Jun. 9, 2005 24

TABLE 4-continued TABLE 4-continued

Human MtDNA Nucleotide Alleles in 48 Genomes Human MtDNANucleotide Alleles in 48 Genomes

Oil Oil nucleotide Cambridge Cambridge nucleotide Cambridge Cambridge locus alleles alleles locus alleles alleles

225 668O 232 6713 248 6734 312 6752 336 6776 370 6815 454 6827 529 6962 529 6989 58O 7028 586 7052 596 7055 646 71.46 715 71.54 767 7175 769 71.96 82O 7256 824 727 833 7274 841 7389 883 7424 907 7476 917 752 96.O 756 4 977 7600 5027 7624 SO36 7664 SO43 7694 SO46 7765 5096 777 5108 7864 5147 7867 5153 7933 5178 7999 5231 8O27 53OO 808O 5331 8087 5390 8113 5393 8142 5417 81.49 5426 8152 5442 8155 546O 81.85 5465 82OO 547 82O6 5495 8248 558 8251 560 8260 5603 8269 5606 82.71-8279 5633 8286 571 8298 5773 8344 5814 8387 595 8389 8392 8414 8428 84.48 846O 8468 8472 8545 8553 85.63 8566 8584 86.18 8655 8697

US 2005/O123913 A1 Jun. 9, 2005 26

TABLE 4-continued TABLE 4-continued

Human MtDNA Nucleotide Alleles in 48 Genomes Human MtDNANucleotide Alleles in 48 Genomes

Oil Oil nucleotide Cambridge Cambridge eotide Cambridge Cambridge OCS alleles alleles OCS alleles alleles

272O 566 2738 569 2810 668 2822 693 2882 766 293O 783 2948 793 2967 798 2972 836 3O2O 861 3068 905 31O1 911 3104 4 974. 3105 SO34 3.194 SO43 3263 5110 3.276 5115 3368 51.36 3440 5172 3485 5204 3494 5217 3500 5218 3506 5238 3512 5257 3563 5261 3590 5301 3617 5317 3650 5318 3708 5323 3734 5326 37.59 5431 378O 5442 3789 5452 3803 5466 3812 54.87 3827 5497 388O 5519 3886 5.535 3914 56O7 3924 566 3928 5724 3958 5766 3966 57.84 4OOO 5793 O16 O34 O59 O70 O88 118 128 148 167 178 2OO 2O3 215 221 233 272 284 3O8 318 374 459 470 484 488 5O2 560 US 2005/O123913 A1 Jun. 9, 2005 27

TABLE 4-continued TABLE 4-continued

Human MtDNA Nucleotide Alleles in 48 Genomes Human MtDNANucleotide Alleles in 48 Genomes

Oil Oil nucleotide Cambridge Cambridge nucleotide Cambridge Cambridge OCS alleles alleles locus alleles alleles

29 16360 C T 45 16362 T C 48 16366 C T 53 16368 T C 62 16390 G A. 63 16391 G A. 67 16399 A. G 68 16519 T C 72 76 82 83 Example 2 84 85 0103) The mtDNA sequences of Example 1 were chosen 86 87 because they represent all of the major haplogroup lineages 88 in humans. Analysis of these Sequences has reaffirmed that 88 89 all human mtDNAS belong to a single maternal tree, rooted 92 in Africa (R. L. Cann et al., Nature 325:31-36 (1987); M. J. 6 93 6212 Johnson et al., (1983) Journal of Molecular Evolution 6213 19:255-271, D. C. Wallace et al., “Global Mitochondrial 6214 DNA Variation and the Origin of Native Americans” in The 6217 62.19 Origin of Humankind, M. Aloisi, B. Battaglia, E. Carafoli, 6223 G. A. Danieli, Eds., Venice (IOS Press, 2000); M. Ingman et 6224 6227 al., (2000) Nature 408:708-13; and D. C. Wallace et al., 6229 (1999) Gene 238:211-230). A cladogram of these mtDNA 6230 6231 Sequences is shown in FIG. 1. Haplogroups are designated 6232 on branches of the tree. A calibration of the Sequence 6234 6235 evolution rate for the coding regions of the mtDNA, based 6239 on a human-chimpanzee divergence time of 6.5 million 6243 years ago (MYA) (M. Goodman et al., (1998) Mol Phyllo 6245 6249 genet. Evol. 9:585-98), has permitted an estimate of the time 6254 to the most recent common ancestor (MRCA) of the human 6255 6256 mtDNA phylogeny at ~200,000 years before present (YBP), 6258 and an estimate of the time of the MRCA for each major 6260 haplogroup (Table 5).

TABLE 5 Coalescence dates for haplogroups Time to MRCA Sample s.e. (x10' Time to MRCA Haplogroup sizes mutations per np) s.e. (x10 years) chimp + human 1 + 104 818.05 - 0.75 6,500 humans 104 24.88 - 0.90 198 19 LO 8 17.92 1.87 142 17 L1 9 17.81 - 1.77 142 17 L2 7 11.57 - 1.30 91.9 11.8 N 50 8.09 - O.S3 64.3 +5.8 A. 4 4.06 - 0.92 32.3 - 7.6 R 37 7.66 - 0.51 60.9 5.5 HV 15 3.61 - 0.73 28.7 6.1 H 11 2.40 0.40 19.1 - 3.4 V 3 1.71 - 0.60 13.6 4.8 JT 7 6.29 O.74 SO.O 6.7 J 4 4.33 - 0.87 34.4 7.2 T 3 1.40 - 0.55 11.1 + 4.4 U 4 6.51 - 0.66 51.7 6.2 M 22 8.15 - 0.74 64.8 7.1 US 2005/O123913 A1 Jun. 9, 2005 28

from the exclusively African haplogroups L0-L2 to the TABLE 5-continued progenitors of the European and Asian mtDNA lineages, only two African mtDNA lineages, macro-haplogroups M Coalescence dates for haplogroups and N, which arose about 65,000 YBP, left Africa to colonize Time to MRCA Eurasia. Moreover, the times of the MRCAS of macro Sample s.e. (x10 Time to MRCA haplogroupS M and N as well as Sub-macro-haplogroup R Haplogroup sizes mutations per np) s.e. (x10 years) are similar, Suggesting rapid population expansion associ CZ 1O 5.91 O.87 47.0 - 7.6 ated with the colonization of Eurasia. C 9 3.56 - 0.65 28.3 5.5 D 6 4.19 O.67 33.3 5.7 0106 Similarly, when humans later moved from Central G 3 4.75 O.93 37.77.8 Asia to the Americas, the number of lineages was again reduced from dozens to about five. There is great mtDNA * The high probability of reverse mutations in the control region led us to diversity in Asia, yet this diversity is Substantially reduced calculate the times to the MRCAs using the entire mtDNA, excluding the control region (np 577-16023). in Siberia, and only five mtDNA haplogroups (A, B, C, D, Based on this value we estimated the average sequence evolution rate as and X), which arose in Asia about 28,000-34,000 YBP, (1.26 + 0.08) x 10 per nucleotide per year, using the HKY85 model (M. Successfully crossed the Bering land bridge to occupy the Hasegawa et al., (1985) J Mol. Evol. 22: 160-74 (1985)). Standard errors calculated from the inverse hessian at the maximum of Americas. Human mtDNA haplogroup migrations are the likelihood do not include any uncertainty in the calibration point, and depicted in FIG. 2. were calculated using the delta method. The coalescence times of the vari ous haplogroups may well be underestimated because of their small sample size. Example 4 0107 Further analysis demonstrated which alleles are Example 3 descriptive of the major haplogroups, Selected Sub-haplo groups, and Selected macro-haplogroups. The mtDNA 0104 Inter-Continental Founder Events nucleotide positions and the relevant alleles are shown in 0105. The most striking feature of the mtDNA tree is the FIG. 3. The data is arranged as a cladogram, Such that a remarkable reduction in the number of mtDNA lineages that group on the left contains all of the alleles to its right. A are associated with the transition from one continent to vertical bar designates that the alleles to the right of the bar another. For example, when humans moved to Eurasia from are present in all of the groups to the left of the bar. The Africa, the number of mitochondrial lineages was reduced haplogroup data in FIG.3 is summarized in Tables 6 and 7. from dozens to two lineages. While northeastern Africa The sub-haplogroup data is Summarized in Tables 8 and 9. encompasses the entire range of African mtDNA variation Each group contains the alleles listed below it.

TABLE 6

LO L1 L2 L3 C D E G Z.

1048T 2352C 325T 2352C 35S2C 4-883T 16227G 4833G 11078G 3516A 3796C 68OC 86.18C 4715G 5178A 82OOC 16185T 4312T 5951G 2416C 10O86C 7196A 8414T 16O17C 16224C 4586C 5984G 2758G 10398A 8584A 14668T 16129A 1626OT 5442C 6O71C 4158G 10819G 95.45G 15487T 6185C 9072G 82O6A 14212C 13263G 16362C 16362C 8113A 10586A 9221G 16124C 14318C 8251A 1281 OG 11944C 16278T 16298C 16298C 9347G 13485G 138O3G 16362C 16327T 94O2C 3666A 13958C 489C 489C 489C 489C 489C 98.18T 7055G 16278T 104OOT 104OOT 104OOT 104OOT 104OOT 10589A 7389C 16390G 14783C 14783C 14783C 14783C 14783C 10664T 13789C 15043A 15043A 15043A 15043A 15043A 10915C 14178C 15301A 15301A 15301A 15301A 15301A 15301A 15301A 12OO7A 13276G 13506T 825A 825A 2758A 2758A 2885C 2885C 7146G 71.46G 8468T 8468T 8655T 8655T 10688A 10688A 1081OC 10810C 13105G 131OSG 769A 769A 1018A 1018A 3594T 3594T 4104G 4-104G 7256T 7256T 7521A 7521A 1365OT 1365OT

US 2005/O123913 A1 Jun. 9, 2005 30

0110

TABLE 9

T:

90SSA 16318T 16172C 3197C 4646C 16343G 15907G 13104G 1812G 16224C 16219G. 7768G 11332T 16051G 14O7OG 4233G 16311T 1627OT 16356C 16129C 16189C 16249C 11467G 11467G 11467G 11467G 11467G 11467G 11467G 11467G 888A 123O8G 123O8G 123O8G 123O8G 123O8G 123O8G 123O8G 123O8G 4 917G. 4 12372A 12372A 12372A 12372A 12372A 12372A 12372A 12372A 8697A

4 4 1251G 5452A 6126C 12705C 12705C 12705C 12705C 12705C 12705C 12705C 12705C 2705C 16223C 16223C 16223C 16223C 16223C 16223C 16223C 16223C 6223C 87O1A 87O1A 87O1A 87O1A 87O1A 87O1A 87O1A 87O1A 954OT 954OT 954OT 954OT 954OT 954OT 954OT 954OT 10873.T 10873.T 10873.T 10873T 10873.T 10873.T 10873.T 10873.T

Example 5 11812G, 12633T, 14233G, 16163C, 16186T, 1888A, 4917G, 8697A, 10463C, 13368A, 14905A, 15607G, 0111 Further analysis of the data in FIG.3 demonstrated 15928A, or 16294T. Haplogroup V can be diagnosed by sets of nucleotide alleles useful for diagnosing the haplo identifying 72C, 4580A, or 15904T. Haplogroup H can be groups. A Set of nucleotide alleles useful for diagnosing all diagnosed by identifying 2706A or 7028C. Diagnosis of of the haplogroups and sub-haplogroups in FIG. 3 is listed haplogroup B is more complicated, requiring three StepS. in Table 10. There are many equivalent methods for diag nosing the haplogroups. Examples of methods requiring Haplogroup B can be diagnosed by identifying 16189C; and testing only or a few loci follow. Alleles are identified in by identifying the absence of 1719A, 35.16G, 6221C, human Samples containing mtDNA. Haplogroup L0 can be 14470C, or 16278T, and by identifying the absence of diagnosed by identifying 4586C, 9818T, or 8113A. Haplo 1888A, 4216C, 4917G, 8697A, 10463C, 11251G, 11467G, group L1 can be diagnosed by identifying 825A, 2758A, 12308G, 12372A, 12633T, 13104G, 13368A, 14070G, 2885C, 7146G, 8468T, 8655T, 10688A, 10810C, or 14905A, 15452A, 15607G, 15928A, 16126C, 16163C, 13105G. Haplogroup L2 can be diagnosed by identifying 16186T, 16249C, or 16294T. 2416C, 2758G, 8206A, 9221G, 11944C, or 16390G. Hap logroup L3 can be diagnosed by identifying 10819G, TABLE 10 14212C, 8618C, 10086C, 16362C, 10398A, or 16124C. Haplogroup C can be diagnosed by identifying 3552C, Nucleotide Alleles Useful for Diagnosing Human Haplogroups 4715G, 7196A, 8584A, 954.5G, 13263G, 14318C, or 72 2O)4 16327T. Haplogroup D can be diagnosed by identifying 2O7 4883T, 5178A, 8414T, 14668T, or 15487T. Haplogroup E 295 can be diagnosed by identifying 16227G. Haplogroup G can 663 be diagnosed by identifying 4833G, 8200C, or 16017C. 825 1243 Haplogroup Z can be diagnosed by identifying 11078G, 1719 16185T, or 16260T. Haplogroup A can be diagnosed by 1888 identifying 663G, 1629OT, or 16319A. Haplogroup I can be 2416 diagnosed by identifying 4529T, 10034C, or 16391A. Hap 27O6 2758 logroup W can be diagnosed by identifying 204C, 207A, 2758 1243C, 5046A, 5460A, 8994A, 11947G, 15884C, or 2885 16292T. Haplogroup X can be diagnosed by identifying 3197 3516 1719A, 35.16G, 6221C, or 14470C. Haplogroup F can be 3552 diagnosed by identifying 12406A or 16304C. Haplogroup Y 4216 can be diagnosed by identifying 7933G, 8392A, 1623 1C, or 4529 16266T. Haplogroup U can be diagnosed by identifying 458O 3197C, 4646C, 7768G, 9055A, 11332T, 13104G, 14070G, 4586 4646 15907G, 16051G, 16129C, 16172C, 16219G, 16249C, 4715 16270T, 16311T, 16318T, 16343G, or 16356C. Haplogroup 4833 J can be diagnosed by identifying 295T, 12612G, 13708A, 4883 or 16069T. Haplogroup T can be diagnosed by identifying US 2005/O123913 A1 Jun. 9, 2005 31

TABLE 10-continued TABLE 10-continued Nucleotide Alleles Useful for Diagnosing Human Haplogroups Nucleotide Alleles Useful for Diagnosing Human Haplogroups 4917 6219 SO46 6227 5178 6231 5460 6249 6221 6260 7028 6266 71.46 6270 71.96 6278 7768 629O 7933 6292 8113 6294 82OO 6304 82O6 6311 8392 6318 8414 6319 8468 6327 8584 6343 86.18 6356 8655 6362 8697 6390 8994 6391 9055 9221 95.45 98.18 0112 Additional alleles are included in Table 11. These OO34 alleles are useful for designing equivalent methods, to those O086 O398 described above, for diagnosing the haplogroups. Alleles in O463 Table 11 are useful for designing efficient methods for O688 diagnosing macro-haplogroups. The data in Tables 10 and 11 O810 O819 and FIG. 3 are also useful for identifying sub-haplogroups. 1078 This invention provides a method for diagnosing Sub-hap 1251 logroup L1a1 by identifying in a human Sample, one of the 1332 nucleotide alleles Selected from the group consisting of 1467 1812 4586C and 98.18T. This invention provides a method for 1944 diagnosing Sub-haplogroup L1a2 by identifying in a human 1947 Sample, one of the nucleotide alleles Selected from the group 2308 consisting of 8113A and 8251A. This invention provides a 2372 24O6 method for diagnosing Sub-haplogroup L1b1 by identifying 2612 in a human sample, the nucleotide allele 2352C and one of 2633 the nucleotide alleles Selected from the group consisting of 3104 3105 3666A, 7055G, 7389C, 13789C, and 14178C. This inven 3263 tion provides a method for diagnosing Sub-haplogroup L1b2 3368 by identifying in a human Sample, one of the nucleotide 3708 alleles selected from the group consisting of 3796C, 5951G, 4O7O 5984G, 6071C,9072G, 10586A, 12810G, and 13485G. This 4212 4233 invention provides a method for diagnosing Sub-haplogroup 4318 L2a by identifying in a human Sample the nucleotide allele 447O 13803G. This invention provides a method for diagnosing 4668 Sub-haplogroup L2b by identifying in a human Sample the 4.905 53O1 nucleotide allele 4158G. This invention provides a method 5452 for diagnosing Sub-haplogroup L2c by identifying in a 54.87 human Sample, one of the nucleotide alleles Selected from 56O7 5884 the group consisting of 325T, 680C, and 13958C. This 5904 invention provides a method for diagnosing Sub-haplogroup 5907 L3a by identifying in a human Sample, one of the nucleotide 5928 alleles Selected from the group consisting of 2325C, 6O17 6051 10819G, and 14212C. This invention provides a method for 6069 diagnosing Sub-haplogroup L3b by identifying in a human 6124 sample the nucleotide allele 86.18C. This invention provides 6126 a method for diagnosing Sub-haplogroup L3c by identifying 6129 61.63 in a human sample the nucleotide allele 10086C. This 6172 invention provides a method for diagnosing Sub-haplogroup 6.185 L3d by identifying in a human Sample the nucleotide allele 6186 10398A. This invention provides a method for diagnosing Sub-haplogroup Uk by identifying in a human Sample, one US 2005/O123913 A1 Jun. 9, 2005 32 of the nucleotide alleles Selected from the group consisting of 9055A and 16311T. This invention provides a method for TABLE 11-continued diagnosing Sub-haplogroup U7 by identifying in a human Nucleotide Alleles Useful for Diagnosing sample the nucleotide allele 16318T. This invention pro Human Haplogroups and Macro-Haplogroups vides a method for diagnosing Sub-haplogroup U6 by iden tifying in a human Sample, one of the nucleotide alleles selected from the group consisting of 16172C and 16219G. 4715 G This invention provides a method for diagnosing Sub-hap- 4833 G logroup U5 by identifying in a human Sample, one of the 4883 T nucleotide alleles Selected from the group consisting of 5. 3197C, 7768G, and 16270T. This invention provides a 5178 A. method for diagnosing Sub-haplogroup U4 by identifying in 5442 C a human Sample, one of the nucleotide alleles Selected from 5460 A. the group consisting of 4646C, 11332T, 16356C. This inven- S. tion provides a method for diagnosing Sub-haplogroup U3 6O71 C by identifying in a human Sample the nucleotide allele 6.185 C 16343G. This invention provides a method for diagnosing 6221 C Sub-haplogroup U2 by identifying in a human Sample, one 2. of the nucleotide alleles Selected from the group consisting 71.46 G of 15907G, 16051G, and 16129C. This invention provides a 71.96 A. method for diagnosing Sub-haplogroup U1 by identifying in 7256 T a human Sample, one of the nucleotide alleles Selected from t k the group consisting of 13104G, 14070G, 16189C, and 7768 G 16249C. This invention provides a method for diagnosing 7933 G Sub-haplogroup T by identifying in a human Sample, one of 8113 A. the nucleotide alleles Selected from the group consisting of s k 11812G and 14233G. This invention provides a method for 8251 A. diagnosing Sub-haplogroup T1 by identifying in a human 8392 A. Sample, one of the nucleotide alleles Selected from the group i. T consisting of 12633T, 16163C, and 16186T. 8584 A. 86.18 C TABLE 11 8655 T 8697 A. Nucleotide Alleles Useful for Diagnosing 87O1 A. Human Haplogroups and Macro-Haplogroups 8994 A. 9055 A. 72 C 9072 G 73 A. 9221 G

295 T 95.40 T 325 T 95.45 G 489 C 98.18 T 663 G OO34 C 68O C O086 C 769 A. O398 A. 825 A. O4OO T 1018 A. O463 C 1048 T O586 A. 1243 C O589 A. 1719 A. O664 T 1888 A. O688 A. 2352 C O810 C 2416 C O819 G 27O6 A. O873 T 2758 A. O915 C 2758 G 1078 G 2885 C 1251 G 31.97 C 1332 T 3516 A. 1467 G 3516 G 1719 G 3552 C 1812 G 3594 T 1944 C 3666 A. 1947 G 3796 C 2007 A. 4104 G 2308 G 4158 G 2372 A. 4216 C 24O6 A. 4312 T 2612 G 4529 T 2633 T 458O A. 2705 C US 2005/O123913 A1 Jun. 9, 2005 33

10688A, 10810C, or 13105G; and identifying the absence of TABLE 11-continued one of 3666A, 7055G, 7389C, 13789C, or 14178C. Other equivalent methods can be derived from the data in FIG. 3, Nucleotide Alleles Useful for Diagnosing Human Haplogroups and Macro-Haplogroups and are within the Scope of this invention. 2810 Example 6 3104 3105 3263 0114 Lebers Hereditary Optic Neuropathy (LHON) is a 3276 form of blindness caused by mitochondrial DNA (mtDNA) 3368 mutations. Four mutations, 3460A, 11778A, 14484C, and 34.85 3506 14459A, account for over 90% of LHON worldwide and are 3650 designated “primary' mutations. Primary mutations Strongly 3708 predispose carriers to LHON, are not found in controls, are 3789 3803 all in Complex I genes, and do not co-occur with each other. 3958 It has been demonstrated that the 11778A and 14484C 4O7O mutations occurred more frequently than expected in asso 4178 4212 ciation with European mtDNA haplogroup J (found in 9% of 4233 European-derived mtDNAS), Suggesting a Synergistic inter 4318 action among mtDNA mutations increased the probability of 447O 4668 disease expression. Sequence analysis of two Russian 4766 LHON families without primary LHON mutations, includ 4783 ing removal of nucleotide alleles listed in Table 3, demon 4.905 SO43 strated two new complex I mutations, 3635A and 4640C. 53O1 Venous blood samples were obtained from the family mem 5452 bers. Genomic DNA was isolated from the buffy coat blood 54.87 fraction using Chelex 100 (Cetus, Emberyville, Calif., 56O7 5884 USA). mtDNA was amplified by PCR in 2-3 kb fragments, 5904 purified on Centricon 100 columns, and cycle-Sequenced 5907 using BigDye Terminators (ABI/PerkinElmer Cetus) and an 5928 6O17 ABI Prism 377 automated DNA sequencer. The mutations 6051 were confirmed using mutation-specific restriction enzyme 6069 digestion following mismatched-primer PCR amplification 6124 6126 of white blood cell mtDNA (Brown M. D. et al., (1995) 6129 Human Mutat. 6:311-325). 6129 61.63 Example 7 6172 6.185 6186 0115) A new primary LHON mtDNA mutation, 10663C, 6189 affecting a Complex I gene was homoplasmic in 3 Caucasian 6219 LHON families, all of which belonged to haplogroup J. 6223 6224 These 3 families were the only haplogroup J-associated 6227 LHON families (out of 17) that did not harbor a known, 6231 primary LHON mutation. Comprehensive phylogenetic 6249 6260 analysis of haplogroup J using complete mtDNA sequences 6266 demonstrated that the 10663C variant has arisen 3 indepen 6270 dent times on this background. This mutation was not 6278 present in over 200 non-haplogroup J European controls, 74 629O 6292 haplogroup J patient and control mtDNAS, or 36 putative 6294 LHON patients without primary mutations. A partial Com 6298 plex I defect was found in 10663C-containing lymphoblast 6304 6311 and cybrid mitochondria. Thus, the 10663C mutation has 6318 occurred three independent times, each time on haplogroup 6319 J and only in LHON patients without a known LHON 6327 mutation. This makes the 10663C mutation unique among 6343 all pathogenic mtDNA mutations in that it appears to require 6356 6362 the genetic background provided by haplogroup J for 6390 expression. These results provide further evidence for the 6391 predisposing role of haplogroup J and for the paradigm of “mild” mtDNA mutations interacting in an additive way to precipitate disease expression. Europeans with the mild 0113 An equivalent method for diagnosing a haplogroup ND6 np 14484 and ND3 np 10663 Leber's Hereditary Optic is diagnosing haplogroup L0 by identifying the presence of Neuropathy (LHON) missense mutations are more prone to one of 825A, 2758A, 2885C, 7146G, 8468T, 8655T, blindness if they also possess the mtDNA haplogroup J. US 2005/O123913 A1 Jun. 9, 2005 34

Example 8 0120 When the Asian-Native American haplogroups A, 0.116) To assess the importance of demographic factors in B, C, D and X mtDNAS (n=26) were analyzed separately, inter-continental mtDNA sequence radiation, deviations they also showed significant deviation from neutrality for from the standard neutral model were tested for in the the Fu and Li D* test (D*=-2.65, P-0.05), although not for distribution of mtDNA sequence variants using the Tajima's the Tajima's D test (D=-1.60, ns). Their distribution of D and Fu and L1 D* tests (Y. X. Fu, W. H. L1, (1993) pairwise Sequence differences was also Strongly uni-modal, Genetics 133:693-709. and F. Tajima, (1989) Genetics 123, indicating that the population expanded as people moved 585-95). The standard neutral model of through Siberia and Beringia and into the Americas. assumes a random-mating population of constant size, with Example 9 all mutations uniquely arising and Selectively neutral. The continental frequency distribution of pairwise mtDNA 0121 Variable Replacement Mutation Rates in Human Sequence differences was calculated to test for rapid popu mtDNA Genes lation expansion using the method of A. R. Rogers, H. 0122) To determine if selection was an important factor in Harpending, (1992) Mol. Biol. Evol. 9:552-569. causing the Sudden shifts in mtDNA sequence variation between continents, the number of non-synonymous to 0117 For the African mtDNA sequences (n=32), the Synonymous base Substitutions was analyzed for all 13 results did not significantly deviate from the Standard neutral mtDNA protein genes of those haplogroups which contrib model, and the frequency distribution of pairwise Sequence uted to the colonization of each of the major continental difference counts was broad and ragged. Both of these Spaces: African, European, and Native American. For results are consistent with the model that the African popu example, for the “Native Americans” the mtDNAS from the lation has been relatively stable for a long time. By contrast, Asian-Native American haplogroups A, B, C, D and X were the non-African mtDNAS (n=72) showed a highly signifi combined. The Asian-Native American mtDNAS from the cant deviation from neutrality (Tajima's D=-2.43, P-0.01; haplogroups were combined because random mutations Fu and L1 D*=-5.09, P-0.02), as well as a bell-shaped accumulate in founder populations and those mtDNAS frequency distribution of pairwise Sequence differences. which prove advantageous in new environments are Thus, these results are consistent with population expan enriched. Hence, the founding mutations of the haplogroup Sions having distorted the frequency distribution (L. are important in the continental Success of the lineage. We Excoffier, J. Mol. Evol. 30:125-39 (1990) and D. A. Merri then tested for possible Selective effects during the coloni wether et al. (1991).J. Mol. Evol 33:543-555). zation of each continent by comparing the ratio of non 0118. To better define the regional distribution of these Synonymous verSuS Synonymous nucleotide Substitutions demographic influences, the Eurasian Samples were divided for each mtDNA gene. An increase in the non-Synonymous into European and Asian plus Native American. Analysis of to Synonymous mutation ratio Suggests that Selection has all European mtDNAS also revealed Significant deviations favored the propagation of a functionally altered protein. from the standard neutral model (Tajima's D=-2.19, P-0.01; 0123 The comparison of the ratio of nonsynonymous to Fu and Li D*=-3.31, P-0.02). The distribution of pairwise Synonymous mutations, counting each change only once, sequence differences for the European mtDNAS revealed revealed great variation between continents for Several two sharp peaks, hinting at two major expansion phases. The genes (Table 12). Marked increases in the accumulation of most recent of these peaks was lost when haplogroup H and non-synonymous mutations were Seen for ND3 in Africans, V mtDNAS were deleted from the sample. Hence, haplo Cytb and COIII in Europeans, and ATP6 in Native Ameri group H, which represents 40% of modern European mtD cans. The number of non-Synonymous and Synonymous NAS (A. Torroniet al., American Journal of Human Genetics mutations for each gene was also compared between the 62, 1137-1152 (1998)) and has a MRCA of 19,000 YBP, different continents by computing the P value using a came to predominate in Europe relatively recently. Two-tailed Fisher Exact Test. This revealed significant dif 0119) Analysis of the aggregated Asian and Native ferences between Africans and both Europeans and Native American mtDNAS (n=41) also revealed significant devia Americans for COIII, between Africans and Native Ameri tions from the standard neutral model (Tajima's D=-2.28, cans for ATP6, and between Africans and Europeans for the P-0.01, Fu and Li D*=-4.31; P-0.02) as well as revealing sum of all mtDNA genes (Table 12). Hence, this analysis a broad, bell-shaped distribution of pairwise differences Supports the hypothesis that Selection has played a role in consistent with rapid population expansion. Shaping continental mtDNA protein variation.

TABLE 12

Two-Tail FET Number of Polymorphic Sites P-value

African European Native American Afr Afr Eur

N- N- N- WS WS WS Gene syn Syn Ratio Syn syn Ratio Syn syn Ratio Eur Am Am ND1 10 17 O.59 5 5 1.00 4 4 . 0.71 0.69 1.00 ND2 9 22 O.41 4 9 O.44 3 7 O.43 1.OO 1.OO 100 ND3 6 2 3.00 1. 3 O.33 1. 4 0.25 0.22 0.10 1.00 US 2005/O123913 A1 Jun. 9, 2005 35

TABLE 12-continued

Two-Tail FET Number of Polymorphic Sites P-value African European Native American Afr Afr Eur

N- N- N- WS WS WS Gene syn Syn Ratio Syn syn Ratio Syn syn Ratio Eur Am Am ND4L O 7 O.OO O 1 O.OO 1. 4 O.25 1.OO O.42 1.00 ND4 4 35 0.11 2 13 O.15 3 12 O.25 1.OO O38 1.00 ND5 15 31 O.48 8 20 O40 2 14 O.14 O.80 O.19 O.28 ND6 2 14 O.14 1. 6 O.17 3 5 O.6O 1.OO 0.29 O.57 Cytb 11 19 O.S8 14 9 1.56 5 12 O.42. O.10 O.75 O.60 COI 7 30 O.23 O 9 O.OO O 13 O.OO O.32 O.17 1.OO COI 3 19 O16 O 4 OOO 2 6 O.33 1.OO 0.59 0.52 COIII 1. 13 O.08 6 5 1.20 7 10 0.70 m2 (.5 0.70 ATP6 3 15 O.2O 5 6 O.83 7 5 1.40 0.20 (.5 0.68 ATP8 2 3 O.67 2 O 1. 3 O.33 O.43 1.OO 0.40 Total 73 227 0.32 48 90 0.53 39 99 0.39 in O41 O.30 * Replacement versus synonymous mutation numbers of mtDNA genes. Rplmt = replace ment mutations, ratio = rplmt/silent. FET = Fisher Exact Test. Afr = Africa, Eur = Europe, Am = Native American. The ratios of polymorphic sites in bold-italics highlight some of the higher values observed. Those in bold-italics under Two-Tailed FET indicate comparisons that are significant at the 0.05 level.

Example 10 These results are consistent with the reduced intra-specific 0.124. Since the above analysis counts each mutation only Versus inter-specific conservation observed for other genes once, irrespective of its frequency Within the haplogroup, it (C.A. Wise et al., (1998) Genetics 148:409-21), and with the under-emphasizes the importance of nodal mutations and hypothesis that mitochondrial protein variation is acceler over-emphasizes the importance of terminal private poly ated in humans and other primates, as Seen in cytochrome c morphisms. AS an alternative to this approach, we calculated oxidase genes (L. I. Grossman et al., (2001) Mol. Phyllo the corrected non-synonymous (K) and Synonymous (K) genet. Evol. 18:26-36). mutation frequencies and then determined the relative Selec Example 12 tive constraints acting on that gene by calculating the k. value {k=-1n(K/K)}. A high k value is indicative of 0127. To further investigate the possibility that individual high protein Sequence conservation and low amino acid mtDNA protein genes differ in their selective constraints in variation, while a low value is indicative of low protein different human continental populations, k values for all 13 conservation and high amino acid variation (N. Neckelmann mtDNA protein genes from each Set of continental haplo et al., (1987) Proc. Natl. Acad. Sci. USA 84:7580-7584). groups were calculated: African, European, and the Native 0.125 The k values for each human mtDNA gene were American. The cumulative Selective preSSure that Separated compared acroSS the total global collection of human the mtDNAS of pairs of continents by pair-wise comparison mtDNA sequences (FIG. 4). The ATP6 gene was the least of the k values was calculated for the genes of each mtDNA conserved gene in the human mtDNA, though previously it had been shown to be relatively highly conserved in inter (Table 13). Comparison of mtDNA protein k values in specific comparisons (N. Neckelmann et al., (1987) Proc. Europeans versus Africans revealed that three genes (ND1, Natl. Acad. Sci. USA 84:7580-7584). cytb and COIII) had significantly lower sequence conserva tion in Europeans. A comparison of the kic values of Native Example 11 American verSuS African mtDNA genes revealed six genes 0.126 The higher inter-specific conservation of ATP6 was (ND4, ND6, COII, COIII, ATP6 and ATP8) that had sig confirmed by comparing the k values of human verSuS nificantly lower Sequence conservation in Native Americans. chimpanzee (Pan troglodytes) and bonobo (Pan paniscus); Finally, comparison of the k values of Africans verSuS human versus eight primate species (baboon, Bomeo and Europeans or Native Americans revealed four mtDNA genes Sumatran orangutan, gibbon, gorilla, lowland gorilla, bonobo, and chimpanzee); and human verSuS 13 diverse (ND3, ND5, cytb, and COI) had significantly lower mammalian Species (bovine, mouse, cat, dog, pig, rat, rhi Sequence conservation in Africans. The greatest differences noceros, horse, gibbon, gorilla, orangutan, bonobo, chim in k values were Seen for the comparisons of COIII and panzee) (FIG. 3). Thus, while ATP6 is highly conserved ATP6 between Africans and Native Americans and for COIII between species, it is very poorly conserved within humans. between African and Europeans (Table 13). US 2005/O123913 A1 Jun. 9, 2005 36

TABLE 13

Native American African European sequences sequences sequences T-test {A, B, C, D, X} T-test GENES (n = 32) (n = 31) P value (n = 26) P value ND1 2.08 - 1.18 O.27 - 1.90 P & O.OOO1 2.07 - 1.92 ND2 1.72 1.07 1.57 1.85 NS 1.81 + 1.11 ND3 0.51 - 1.87 O.91 - 232 NS 1.70 - 1.32 ND4L : : NS 2.41 - 3.83 ND4 3.49 - 1.34 3.39 2.23 NS 2.20 - 1.19 ND5 1.78 O.71 2.20 - 1.2O NS 3.63 - 3.56 ND6 2.51 - 1.19 3.13 - 3.99 NS 1.15 - 1.52 Cytb 1.89 - 0.96 0.34 - 1.51 P & O.OOO1 2.46 - 1.15 COI 2.37 O.95 3.85 3.93 P & O.OS : COI 2.73 - 1.32 : : 1.74 212 COIII 4.65 - 3.94 O.94 - 2.08 2.11 - 1.26 ATP6 2.31 - 1.28 1.48 2.28 -0.14 + 1.34 ATP8 2.62 - 1.89 : : 1.25 + 1.94 * Estimates of coefficients of selective constraint (k) stratified by gene and region. k. values and standard deviations calculated for African, European and Asian-American hap logroups A, B, C, D and X mtDNA protein-coding genes. * indicates that k values could not be calculated, since either K or K were 0, Haplo group X is represented only by the Native-American sequence, the European X sequence being excluded. 0128. Taken together, these data show that different were analyzed for evolutionary Significance. Evolutionarily Selective forces have acted on individual mtDNA genes as Significant alleles reside in evolutionarily significant genes humans colonized different continents. Moreover, the and cause amino acid changes. A list of the evolutionarily observed differences in mtDNA protein sequence correlate significant nucleotide alleles in ND1, ND2, ND3, ND4, with the climatic transitions that humans would have expe ND5, ND6, Cytb, COI, COII, COIII, ATP6, and ATP8 rienced as they migrated out of tropical and Sub-tropical appear in Table 14. The Cambridge nucleotide alleles in Africa and into temperate Eurasia and arctic Siberia and Table 14 are evolutionarily Significant. These amino acid Beringia. The mtDNA genes that showed the highest amino alleles, including the Cambridge alleles, are evolutionarily acid Sequence variation between continents were COM and Significant. The locations of the amino acid alleles are ATP6. identified by the location of the nucleotide allele listed in Table 3. Other evolutionarily significant nucleotide alleles Example 13 not listed in Table 14, include alleles at neighboring nucle 0129. The nucleotide alleles in Table 3 residing in evo otide loci that are within the same codon and code for the lutionarily significant genes identified in Examples 9-12 Same amino acids that are listed in Table 14.

TABLE 1.4 Evolutionarily Significant Human Mitochondrial Nucleotide and Amino Acid Allele

Non Cambridge Cambridge Non Genome Cambridge Nucl. Amino Cambridge Gene Location Nucleotide Allele Acid AA Allele

ND 33O8 T C M T ND 3316 G A. A. T ND 3394 T C Y H ND 3505 A. G T A. ND 3547 A. G I V ND 3565 A. G T A. ND 3644 T C V A. ND 3796 A. T T A. ND 3796 A. G T S ND 3796 A. C T P ND 38O8 A. G T A. ND 3866 T C I T ND 4O25 C T T M ND 4040 C T T M ND 4048 G A. D N ND 4123 A. G I V ND 4216 T C Y H ND 4225 A. G M V ND 4232 T C I T US 2005/O123913 A1 Jun. 9, 2005 37

TABLE 14-continued Evolutionarily Significant Human Mitochondrial Nucleotide and Amino Acid Allele

Non Cambridge Cambridge Non Genome Cambridge Nucl. Amino Cambridge Gene Location Nucleotide Allele Acid AA Allele

ND2 4491 ND2 4506 ND2 4512 ND2 4596 ND2 4695 ND2 4767 ND2 4824 ND2 4833 ND2 4917 ND2 4960 ND2 SO43 ND2 SO46 ND2 5178 ND2 5262 ND2 5263 ND2 53.31 ND2 5442 ND2 5460 COI 61SO COI 6253 COI 6324 COI 6366 COI 66O7 COI 714 COI COI COI COI TE R COI COI COI COI COI ATP8 ATP8 ATP8 ATP8 : ATP8 ATP8 s ATP6 854 ATP6 856 ATP6 856 ATP6 858 ATP6 86 8 ATP6 87O1 ATP6 8705 ATP6 8764 ATP6 8794 ATP6 8836 ATP6 886O ATP6 8875 ATP6 8962 ATP6 9053 ATP6 9055 ATP6 9077 ATP6 9103 ATP6 9136 ATP6 91.51 COIII 9237 COIII 9325 COIII 9355 COIII 9456 COIII 94O2 COIII 9477 COIII 9559 COIII 95.91 COIII 9667 COIII 9682 US 2005/O123913 A1 Jun. 9, 2005 38

TABLE 14-continued Evolutionarily Significant Human Mitochondrial Nucleotide and Amino Acid Allele

Non Cambridge Cambridge Non Genome Cambridge Nucl. Amino Cambridge Gene Location Nucleotide Allele Acid AA Allele

COIII 9822 COIII 995.7 COIII 9966 ND3 O086 ND3 O086 ND3 O152 ND3 O182 ND3 O197 ND3 O321 ND3 O398 ND4L O609 ND4 O816 ND4 O920 ND4 1016 ND4 1078 ND4 1150 ND4 1172 ND4 1177 ND4 1654 ND4 1909 ND4 1963 ND4 1969 ND4 2083 ND4 2134 ND5 2346 ND5 2358 ND5 2361 ND5 2373 ND5 2397 ND5 24O6 ND5 2635 ND5 2850 ND5 2940 ND5 2967 ND5 3104 ND5 3105 ND5 3135 ND5 31.45 ND5 3.276 ND5 3477 ND5 3651 ND5 3660 ND5 3708 ND5 37.59 ND5 378O ND5 3789 ND5 3819 ND5 388O ND5 3886 ND5 3924 ND5 3927 ND5 3928 ND5 3958 ND5 3966 ND5 4OOO ND5 4059 ND5 4128 ND6 4178 ND6 4272 ND6 4318 ND6 4319 ND6 4384 ND6 4459 ND6 4484 ND6 45O2 ND6 4571 CytB 4766 CytB 4769 US 2005/O123913 A1 Jun. 9, 2005 39

TABLE 14-continued Evolutionarily Significant Human Mitochondrial Nucleotide and Amino Acid Allele

Non Cambridge Cambridge Non Genome Cambridge Nucl. Amino Cambridge Gene Location Nucleotide Allele Acid AA Allele CytB 4793 A. G H CytB 4798 T C F CytB 4861 G A. A. CytB 4862 C T A. CytB 4979 T C I CytB 5110 G A. A. CytB 5113 A. G T CytB 5204 T C I CytB 5218 A. C T CytB 5218 A. G T CytB 5238 C G I CytB 5257 G A. D CytB 5261 G A. S CytB 5317 G A. A. CytB 5318 C T A. CytB 5323 G A. A. CytB 5326 A. G T CytB 5431 G A. A. CytB 5452 C A. L CytB 5497 G A. G CytB 5519 T C L CytB 5663 T C I CytB 5731 G A. A. CytB 5746 A. G I CytB 58O3 G A. V CytB 5806 G A. A. CytB 5812 G A. V CytB 5824 A. G T CytB 5849 C T T CytB 5884 G C A.

0130. A subset of the alleles in Table 14 that are associ ated with predispositions to physiological conditions using TABLE 15-continued the methods of this invention is listed in Table 15. Amino Acid Alleles Associated with Physiological Conditions in this Invention TABLE 1.5 Nucleotide Amino Acid Amino Acid Alleles Associated with Physiological Alleles Useful Alleles Useful Haplogroups Conditions in this Invention Genome for Diagnosing for Diagnosing Diagnosable Gene Location Haplogroups Haplogroups by Alleles Nucleotide Amino Acid Alleles Useful Alleles Useful Haplogroups ND4 11078 G V Z. Genome for Diagnosing for Diagnosing Diagnosable ND5 124O6 A. I F Gene Location Haplogroups Haplogroups by Alleles ND5 13104 G V (U1) ND5 13105 G V LO, L1 ND1 3796 C P (L1b2) ND5 13276 G V LO ND2 4833 G A. G ND5 13708 A. T J ND2 4917 G D T ND5 13789 C H L1 ND2 SO46 A. I W ND5 13958 C A. (L2c) ND2 5178 A. M D ND6 14178 C V L1 ND2 5442 C L LO ND6 14318 C S C ND2 5460 A. T W CytB 14766 C T V, H COI 71.46 G A. LO, L1 CytB 15452 A. I J, T COI 7389 C H L1 CytB 15884 C P W ATP8 8414 T F D ATP6 8584 A. T C ATP6 86.18 C T (L3b) ATP6 8701 A. T A., I, W, X, B, Example 14 F. Y. U, J, T, V. H 0131 Continent-Specific Amino Acid Substitutions in ATP6 9055 A. T (Uk) ATP6 COIII 94O2 C P LO ND3 10086 C H (L3c) 0132) To further investigate the biological significance of ND3 10398 A. T (L3d) the human continent-specific ATP6 amino acid Substitutions, the amino acid conservation for each variable human posi US 2005/O123913 A1 Jun. 9, 2005 40 tion using 39 animal species mtDNAS (12 primates, 22 other mammals, four non-mammalian Vertebrates, and DroSO TABLE 16-continued phila) was analyzed. This revealed that many of the ATP6 substitutions that are associated with particular mtDNA Nucleotide Nucleotide haplogroups alter evolutionarily conserved, and hence Locus Alleles WIPO code potentially functionally important, amino acids. 53 AG 71 GA 0133) A threonine to alanine substitution at codon 59 8O TC 82 CT (T59A, nucleotide location 8701-8703) in ATP6 separates 83 AG the mtDNAS of macro-haplogroup N from the rest of the 85 GAT World. The polar threonine at position 59 is conserved in all 85 GAT great apes and Some old-world monkeys. 86 CA 89 ACG 0134) Among the haplogroups of macro-haplogroup M, 89 ACG the related Siberian-Native American haplogroups C and Z 94 CT 95 TAC are delineated by an A20T (nucleotide location 8584-8586) 95 TAC variant. A non-polar amino acid found in this position occurs 98 CT in all animal Species except for Macaca, Papio, Bal 99 TC aenoptera and Drosophila. 2OO AG 2O)4 TC 0.135 Among the haplogroups of macro-haplogroup N, 2O7 GA 208 TC the non-R lineage N1b harbors two distinctive amino acid 210 AG substitutions M104V (nucleotide location 8836-8838) and 212 TC T146A. (nucleotides location 8962-8964) The methionine at 215 AG position 104 is conserved in all mammals, and the thereon 217 TC 225 GA at position 146 is conserved throughout all animal mtDNAS. 227 AG Moreover, the T146A Substitution is within the same trans 228 GA membrane C-helix as the pathogenic mutation L156R that 235 AG alters the coupling efficiency of the ATP synthase and causes 236 TC 247 GA the NARP and Leigh syndromes (I. Trounce, S. Neill, D.C. 250 TC Wallace, Proceedings of the National Academy of Sciences 252 TC of the United States of America 91, 8334-8338 (1994)). 263 AG 291 AG 0.136 Also in macro-haplogroup A mtDNAS harbor a 295 CT H90Y (nucleotide location 8794-8796) amino acid substi 297 AG 316 GA tution. The histidine in this position is conserved in all 317 CAG placental mammals except Pongo, Cebus and LOxOdonta 317 CAG and occurs within a highly conserved region. Furthermore, 32O CT among the heterogeneous group of mtDNAS carrying the 325 CT tRNA-CoII 9bp deletion and arbitrarily assigned to hap 340 CT 357 AG logroup B, one mtDNA harbored a F193L (nucleotide loca 373 AG tion 9103-9105) substitution. This position is conserved in 400 TG all mammals except Pongo, Papio, CebuS and Erinaceus. 408 TA 418 CT 0.137 Since each of the MyDNA sequences used in this 456 CT comparison of different species is derived from only one or 462 CT 465 CT two individuals, it is possible that the rare deviant cases are 467 CT due to the accumulation of environmentally adaptive muta 471 TC tions in those Species that parallel those in humans. Thus, the 48O TC 482 TC above ATP6 amino acid polymorphisms have the character 489 TC isticS expected for evolutionary adaptive mutations. 493 AG 499 GA TABLE 16 508 AG 593 TC Nucleotide Nucleotide 597 CT Locus Alleles WIPO code 663 AG 678 TC 64 CT y 68O TC 72 TC y 709 GA 73 AG 710 TC 89 TC y 721 TC 93 AG 750 AG 95 AC 769 GA 114 CT y 825 TA 143 GA 827 AG 146 TC y 850 TC 150 CT y 921 TC 151 CT y 930 GA 152 TC y 961 TCG

US 2005/O123913 A1 Jun. 9, 2005 42

TABLE 16-continued TABLE 16-continued

Nucleotide Nucleotide Nucleotide Nucleotide Locus Alleles WIPO code Locus Alleles WIPO code 4960 CT y 6548 CT y 4977 TC y 6587 CT y 4994 AG 66O7 TC y SOO4 TC y 668O TC y 5027 CT y 6713 CT y SO36 AG 6719 TC y SO43 GT k 6734 GA SO46 GA 6752 AG 5063 TC y 6770 AG 5096 TC y 6776 TC y 5108 TC y 6815 TC y 5147 GA 6827 TC y 5153 AG 6875 CA 5178 CA 6938 CT y 5231 GA 6962 GA 5237 GA 6989 AG 5255 CT y 7028 CT y 5262 GA 7052 AG 5263 CT y 7055 AG 5285 AG 7058 TA w 53OO CT y 7076 AG 533O CA 71.46 AG 5331 CA 71.54 AG 5390 AG 7175 TC y 5393 TC y 71.96 CA 5417 GA 72O2 AG 5426 TC y 7226 GA 5442 TC y 7256 CT y 546O GA 7257 AG 5465 TC y 7271 AG 547 GA 7274 CT y 5492 TC y 7319 TC y 5495 TC y 7337 GA 558O TC y 7347 GA 558 AG 7389 TC y 560 CT y 7403 AG 5603 CT y 7424 AG 5606 CT y 7444 GA 5633 CT y 7476 CT y 5655 TC y 7493 CT y 571 AG 752 GA 5773 GA 756 TC y 581 AG 757 AG 5814 TC y 7600 GA 582 GA 7624 TA w 5826 TC y 7645 TC y 5843 AG 7648 CT y 595 AG 766O TC y 5984 AG 7664 GA 5987 CT y 7673 AG 6O26 GA 7675 CT y 6O29 CT y 7693 CT y 6O45 CT y 7694 CT y 6O7 TC y 7697 GA 6O77 CT y 7744 TC y 6104 CT y 7765 AG 6150 GA 7768 AG 6152 TC y 777 AG 61.64 CT y 7858 CT y 6167 TC y 786 TC y 6182 GA 7864 CT y 6.185 TC y 7867 CT y 622 TC y 7933 AG 6227 TC y 7948 CT y 6253 TC y 7999 TC y 6257 GA 8O14 AG 6324 GA 8O2O GA 6366 GA 8O27 GA 637 CT y 808O CT y 6392 TC y 8087 TC y 6473 CT y 8113 CA 649 CA 8142 CT y 6524 TC y 81.49 AG

US 2005/O123913 A1 Jun. 9, 2005 44

TABLE 16-continued TABLE 16-continued

Nucleotide Nucleotide Nucleotide Nucleotide Locus Alleles WIPO code Locus Alleles WIPO code

O550 AG 1969 GA O586 GA 2007 GA O589 GA 2049 CT y O609 TC y 2070 GA O637 CT y 2083 TG k O640 TC y 2121 TC y O646 GA 2134 TC y O659 CT y 2153 CT y O664 CT y 2172 AG O667 TC y 2175 TC y O688 GA 2234 AG O736 CT y 2236 GA O790 TC y 2239 CT y O792 AG 2248 AG O793 CT y 2308 AG O804 AG 2346 CT y O810 TC y 2358 AG O819 AG 2361 AG O828 TC y 2372 GA O873 TC y 2373 AG O876 AG 2397 AG O894 CT y 24O6 GA O915 TC y 2414 TC y O920 CT y 2477 TC y O939 CT y 25O1 GA O966 TC y 2507 AG O984 CG S 2519 TC y OO2 AG 2528 GA O16 GA 2540 AG O17 TC y 2612 AG O23 AG 2630 GA O78 AG 2633 CT y O92 AG 2635 TC y 147 TC y 2669 CT y 150 GA 2672 AG 167 AG 2693 AG 172 AG 2705 CT y 176 GA 2.720 AG 177 CT y 2738 TC y 215 CT y 2768 AG 251 AG 2771 GA 257 CT y 2810 AG 296 CT y 2822 AG 299 TC y 2850 AG 332 CT y 2879 TC y 362 AG 2882 CT y 365 TC y 293O AT w 377 GA 2940 GA 467 AG 2948 AG 476 CT y 2967 AC 536 CT y 2972 AG 590 AG 2999 AG 611 GA 3O2O TC y 641 AG 3059 CT y 653 AG 3068 AG 654 AG 31O1 AC 674 CT y 3104 AG 701 TC y 3105 AG 719 GA 3135 GA 722 TC y 3143 TC y 767 CT y 31.45 GA 812 AG 3149 AG 854 TC y 3.194 GA 884 AG 3197 CT y 887 GA 3212 CT y 893 AG 3221 AG 899 TC y 3263 AG 909 AG 3.276 AG 914 GA 3281 TC y 944 TC y 3368 GA 947 AG 3440 CG S 959 AG 3477 GA 963 GA 34.85 AG

US 2005/O123913 A1 Jun. 9, 2005 46

TABLE 16-continued TABLE 16-continued

Nucleotide Nucleotide Nucleotide Nucleotide Locus Alleles WIPO code Locus Alleles WIPO code

15803 GA 16231 TC y 15806 GA 16232 CT y 15812 GA 16234 CT y 15824 AG 16235 AG 15833 CT y 16239 CT 15849 CT y y 15884 GC S 16241 AG r 15900 TC y 16242 CT y 15904 CT y 16243 TC y 15907 AG 16245 CT y 15924 AG 16247 AG I 15927 GA 16249 TC y 15928 GA 16254 AC Il 15930 GA 16255 GA I 15932 TC y 16256 CT y 15939 CT y 16257 CT y 15941 TC y 16258 AG 3. ES y 16260 CT y 16017 TC y E. T y 16038 AG r y 16265 AC l 16051 AG r 16266 CT 16069 CT y y 16071 CT y 16268 CT y 16075 TC y 16270 CT y 16086 TC y 16271 TC y 16093 TC y 16274 GA I 16108 CT y 16278 CT y 16111 CT y 16284 AG 16114 CA l 16286 CG S 16124 TC y 16287 CT y 16126 TC y 16288 TC y 16129 GCA w 1629O CT y 16129 GCA w 16291 CT y 16140 TC y 16292 CT y 16144 TC y 16293 AG r 16145 GA r 16294 CT y 16147 CT y 16148 CT y 16296 CT y 16153 GA 16298 TC y 16162 AG 16304 TC y 16163 AC 16309 AG 16166 AC l 16311 TC y 16167 CT y 16316 AG 16168 CT y 16317 AT W 16169 CT y 16318 AT W 16171 AG 16319 GA r 16172 TC y 1632O CT y 16175 AG r 16324 TC y 16176 CT y 16325 TC y 16182 AC m 16326 AG I 161.83 AC m 16327 CT y 16184 CT y 16343 AG 16185 CT y 16344 CT y 16186 CT y 16354 CT y 161.87 CT y 16355 CT 16188 CAG w y 16188 CAG w 16356 TC y 16189 TC y 16357 TC y 16.192 CT y 16360 CT y 16193 CT y 16362 TC y 162O7 AG r 16366 CT y 16209 TC y 16368 TC y 16212 AG r 16390 GA 16213 GA r 16391 GA I 16214 CT y 16399 AG I E. A, y 16438 GA I 16223 CT y 16439 CA l 16224 TC y 16483 GA 16227 AG 16519 TC y 16229 TC y 16527 CT y 16230 AG US 2005/O123913 A1 Jun. 9, 2005 47

Reference to Sequence Listings 0138 SEQ ID NO:1 is a theoretical human mtDNA genome Sequence containing the nucleotide alleles of this invention as listed in Table 3. 0139 SEQ ID NO:2 is the human mtDNA reference Sequence called the Cambridge Sequence (Genbank Acces sion No. J01415).

SEQUENCE LISTING

<160> NUMBER OF SEQ ID NOS: 2 <210> SEQ ID NO 1 &2 11s LENGTH 16569 &212> TYPE DNA <213> ORGANISM: Artificial Sequence &220s FEATURE <223> OTHER INFORMATION: SEQ ID NO : 1 is a composite sequence of the cambridge sequence and human mitochondrial DNA sequence alleles. &220s FEATURE <221 NAME/KEY: misc feature <222> LOCATION: (3106) ... (3106) <223> OTHER INFORMATION: n at position 3106 is a deletion &220s FEATURE <221 NAME/KEY: misc feature <222> LOCATION: (3796) ... (3796) <223> OTHER INFORMATION: n at position 3796 is a g c or it <400 SEQUENCE: 1 gatcacaggit citatcaccct attalaccact Cacgggagct citccatgcat ttgg tattitt 60

Cgtytggggg gyrtgcacgc gatagoatyg C grgmcgctg gag.ccggagc accy tatgtc 120 gcagtatctg. tctittgattc ctrc cycaty yyrttattta togcaccitac rttcaataty 18O ayrgdmgavc atayhtayyr aagygtry tr aytartyaat gcttrtrrga cataryaata 240 acaattraay glyctocacag ccrctittc.ca cacagacatc ataacaaaaa. rttty circca aaccoccoct cocccrvitty tgg cyacago actitaaacay atctotgcca a CCCC aaa. 360 acaaagaacc ctracaccag cctaaccaga tittcaaattk tatctittwgg cgg tatgyac 420 ttittaacagt caccocccaa citalacacatt attittyccct cycaytycca yac tactaay 480 cycatcaaya carccoccirc ccatcc tricc cagdacacac acaccgctoc taa.ccc.cata 540 cc.ccgalacca accaaaccoc aaaga caccc cccacagttt atgtagctta ccyccityaaa 600 gcaatacact gaaaatgttt agacgggctC acatcacccc. ataaacaaat aggtttggto 660 ctrgccttitc tattagcycy tagtaagatt acacatgcaa gcatc.cccry to cagtgagt 720 ycaccotcta aatcaccacg atcaaaaggr acaag catca agcacgcarc aatgcagctic aaaacgctta gcc tag ccac accoccacgg gaalacagcag tgatwarcct ttagdaataa 840 acgaaagtty alactaagcta tactaa.cccd. agggttggtc. aattitcgtgc cagocaccgc 9 OO gg to acacga ttalacc caag ycaatagaar cc.ggc gtaaa gagtgttitta gatcaccccc 96.O bc.cccaataa agctaaaact caccitgagtt gtaaaaaact ccagttgaca caaaatarac 1020 tacgaaagtg gctttalacat rtctgaayac acaatagota agacccaaac tgggattaga 1080 taccccacta tacttagccc taalaccitcaia cagttaaayc aacaaaactg citc.gc.cagaa 1140 cactacgagc cacagottaa aactcaaagg acctggcggit gcttcatayc cctictagagg 1200 agcctgttct gtaatcgata aac cocq atc alacct cacca ccycttgctc agcctatata 1260 US 2005/O123913 A1 Jun. 9, 2005 48

-continued cc.gc.catctt cagoaaacco to atgaaggy tacaaagtaa go.gcaagtac coacgtaaag 320 acgttagg to aaggtgtagc ccatgaggtg gcaagaaatg ggctacattt totaccc.ca.g 38O amalactacga tagcc.cittat gaaacytalag g g to raaggy ggatttagca gtaaactrag 4 40 artagagtgc titagttgaac agggccct ga agcgc.gtaca caccgc.ccgt cacccitcctic 5 OO aart at actt caaaggacat ttalactaaaa coccitacgca tittatataga ggaga caagt 560 cgta acatgg taagtgtact ggaaagtgca cittggacraa ccagagtgta gotta acaca 62O aag cacccaa cittacactta ggagatttica actta acttg accgctotga gctaalaccita 680 gcc.ccaaacc cactccaccy taytaycara caacyttarc caaac cattt accoarayaa 740 agtatagg.cg atagaaattgaaacctgg.cg caatagatay agtaccgcaa goggaaagatg 800 aaaaattata rccaag cata atatagoaag gacta accoc tatacct tct gcataatgaa 860 ttalactagaa atalactittgc aag gagar cc aaagctaaga ccc.ccgaaac Cagacgagct 920 accitaaraac agctaaaaga gcacaccc.gt citatgtagca aaatagtggg aagatttata 98O ggtagaggcg acaaacctay cqagcctggit gatagotggit totccaagat agaatcttag 20 40 ttcaactitta aatttgcc cr cagaaccotc taaatcc cct totaaattta aytgttagtc 2100 caaagaggaa cagotcitttg gacac tagga aaaaaccttg tagagagagt aaaaaattta 216 O acac coatag taggcc-taaa agcagocacc aattaagaaa gogttcaagc ticaac acco a 2220 citacctaaaa aatcc.caaac atatvactga acticcitcaca ccmaattgga ccaatctato 228O accotataga agaactaatg ttagtatrag talacatgaaa acattcticct cyg cataagc 234. O citgcgtcaga tyaaaacrct gaactgacaa tta acagocy aatat citaca atcaiaccaac 24 OO aagt catt at tacccycact gtcaa.cccaa cacaggcato citcataagga aaggittaaaa 2460 aaagtaaaag gaactcggca aaycttaccc cqcctgttta ccaaaaa.cat caccitctago 252O atcaccagta ttagaggcac cqcctg.ccca gtgacacatg tittaacggcc gcggtaccct 258O raccgtgcaa aggtag cata atc acttgtt cottaaatag ggacctgitat gaatggctyc 264 O acgagggitty agctgtc.tct tacttittaac cagtgaaatt gacct gcc.cg tdaag aggcg 27 OO ggcatracac agcaagacga gaag accota toggagctitta atttattaat gcaar carta 276 O. cctaac arac ccacaggtoc taalactacya arcct gcatt aaaaattitcg gttgggg.cga 282O ccitcggagca gaaycmalacc toc gag cagt acatgcyaag acytcaccag toaaag.cgaa 2880 ctacyatact caattgatcc aataacttga ccaacggaac aagttaccct agggataa.ca 2.940 gc gcaatcct attctagagt coatatoaac aatagggittt acg acct cqa tottggatca 3OOO ggacatcc cr atggtgcago C gotiattaaa gqttcgtttgttcaacgatt aaagticcitac 3060 gtgatctgag titcagaccgg agyaatccag gtcggtttct atctancttic aaatticcitcc 312 O citgitacga aa gacaa.gaga aataaggcct actitcacaaa gogcctt.ccc cc.gtaaatga 318O tatcatctoa acttagyatw ayaycyacac ccacccaaga rcagg gtttgttaagatggc 324 O agagcc.cggit aatc.gcataa aacttaaaac tttacagtca gaggttcaay to citcttctt 33OO aacaacayac coatgrocaa cctoctactic citcattgtac coattctaat cqcaatggca 3360 titcc taatgc tyaccgaacg aaaaattcta ggcyatatac alactacgcaa aggcc.ccaac 342O gttgtaggcc cctacggrct act acaaccy titcgctdacg ccataaaact citt caccalar 3480 gag.cccctaa aaccogccac atctrc catc acyctvtaca to accgc.ccc gaccttrgot 354. O US 2005/O123913 A1 Jun. 9, 2005 49

-continued citcaccatyg chottctact atgarccocc citc.cccatac cca accocct ggitya acctic 3600 aaccitagg cc toctattitat totagccacc totagoctag cc.gyttactic aatccitctga 3660 to aggrtgag catcaaactic aaactacgcc citratcggyg cactg.cgagc agtagcc car 372 O acaatctoat atgaagttcac cctag coatc attctrictat caa.cattact aataagtggc 378 O. toctittalacc totconcoct tat cacarca caagarcacc totgattact cotrocatca 384 O tgroccytgg ccataatatg atttayctoc acactagoag agaccaa.ccg aaccoccittc 39 OO gaccttgcc.g aaggggartC mgaactrotc to aggcttca acatc gaata cqc.cgcaggc 396 O ccctitc.gc.cy tattottcat rgc.cgaatac acaaac atta ttataataala cacccticacc 4020 actayaatct toctaggaay aacrtatrac gcactictocc citgaacticta cacaacatat 408 O tttgtyacca agaccotact to traccitcc. citgttcytat grrttcgaac agcataccc.c 414 O cgattacgct acgaccarct catacaccitc. citatgaaaaa act tcc tacc actica.cccita 4200 gcrtitactta tat gayatgt ytccrtacco aytacaatct ccagoatycc cccitcaaacc 4260 taagaaatat gtotgataaa agagttacitt tgatagagta aataatagga gytta aacco 4320 cottatttct aggacyatga gaatc galacc catcc ct gag aatccaaaay totcc.gtgcc 4.380 accitat circa ccc.catcc ta. aagta aggto agctaaataa gctatogggc ccataccc.cg 4 440 aaaatgttgg ttawa.cccitt ccc.gtactaa ttaatcc cct gg.cccaacco rtcatctact 4500 citaccrtytt trc agg caca citcatcachg cgctaagcto rcactgattt tttacct gag 45 60 tagg cctaga aataalacatr ctagoyttta titcCarttct alaccaaaaaa ataalacc citc gttccacaga agctgc catc aagtayttcc toacirca agc aaccq catcc ataatcctitc 4680 taatagoyat cctcytcaac aatatactict cc.ggrdaatg alaccatalacc aatactacca 474. O aycaatactc atcattaata atcatartrg citatagdaat aaaac tagga atagocc cct 4800 ttcacttctg agtcc.cagar gttrc coaag gcirccccitct racatcc.ggc citgcttctitc 4860 to acatgaca aaaactagoc ccyatctoaa toatatacca aatctoycoc toactaracg 4920 taagccittct cct cacticitc. tdaatctitat ccatcatagy agg Cagttga ggtggaytala 4.980 aCCaala C.C. Ca gctrcgcaaa atcytag cat acticcitcaat tacccayata ggatgrataa 5040 takcarttct accgtacaac ccyaacataa ccatt cittaa. tittaactatt tatatyatcc taactacyac cgcattccta citact caact taaactc.cag cacca cracc ctrc tactat 5 160 citc.gcaccitg aaacaa.gmta acatgactaa cacccittaat to catccacc citccitctocc 5220 taggaggcct roccocircta accggcttitt tgc cyaaatg gry cattatc gaagaattca 528 O

Caaaaaaa. tagcctcaity atc.cccacca tdatagocac catcaccostml Ittalacctict 5340 actitctacct acgcctaatc tacticcacct caatcacact acticciccatr toyaacaacg 5 400 taaaaataala atgacartitt galacayacaa aac coacco aytccitcccc acacticatcr 546 O ccctyaccac rctacticcita ccitat citc.cc. cyttyatact aataatctta tagaaattta 552O ggittaaatac agaccalagag ccttcaaag.c cct cagtaag ttgcaatact taatttctgy 558 O racagotaag gactgcaaaa ycycaytctg catcaactoga acgcaaatca gcy actittaa 5640 ttaa.gctaag cc cityactag accalatggga cittaaa.ccca caaac actta gttaa.cagot 5700 aag cacccita rtcaactggc ttcaatctac ttctoccgcc gcc.gggaaaa aaggcgggag 576 O. aag.ccc.cggc agritttgaag citgcttctitc gaatttgcaa ttcaatatga raaycaccitc 582O US 2005/O123913 A1 Jun. 9, 2005 50

-continued rgag Cyggta aaaagaggcC tarccoctdt citttagattt acagtccaat gcttcactca 588 O gccattttac citcaccocca citgatgttcg cc gaccgttg actattotct acaalaccaca 594 O aaga cattgg racactatac citattattog gcgcatgagc tggrg tycta ggcacagotc 6 OOO taagccitcct tatto gagcc gagctroggyC agcCaggcaa. cctity taggit aac gaccaca 6060 totacaacgt yatcgtyaca gcc catgcat ttgtaataat cittyttoata gtaataccca 61.20 toataatcgg aggctttggc aactgactar tycoccitaat aatyggy gcc ccc.gatatgg 618O crittyccc.cg catalaacaac ataagcttct gactcittacc yccctcycto citacticcitgc 624 O togcatctg.c tayagtrag gcc.ggagcag galacaggttg aacagtctac cotcc cittag 6300

Cagg galacta citc.ccaccot ggarc citcc.g tag acctaac catcttctoc ttacaccitag 6360 caggitritcto ytctatotta ggggc catca ayttoatcac aacaattatc. aatataaaac 642O cccctgcc at aaccolaatac caaacgc.ccc tottcgtctg atc.cgtocta atyacagoag 64.80 to citadttct Ictatotic to ccagticcitag citgctgg cat cacyatact a cita acagacc 654. O gcaaccityaa caccaccittc. titc gaccc.cg CCggaggagg agacccyatt citataccaac 6600 accitatyctg atttittcggit caccctgaag tittatattost tat cotacca ggctt.cggaa 6660 taatctocca tattgtaacy tactactcc.g gaaaaaaaga accatttgga tayataggya tggtotgagc taitratatoa. attggctitcc trigggttitat cgtgtgagcr caccayatat 678 O. ttacagtagg aatagacgta gacacacgag catayttcac Ctcc.gcyacc ataatcatcg 6840 citaticciccac cgg.cgtcaaa gtatttagct gactmgccac acticcacgga agcaatatga 69 OO aatgatctgc tgcagtgcto tgagcc.ctag gattcatytt totttitcacci gtaggtggcc 696 O tractogcat tgt attagca aacticatcro tag acatcgt. act acacgac acgtactacg ttgtag cyca citt.ccactat gtoctatoaa triggrgcwgt atttgccatc ataggrggct 708O to attcactg attitcc ccta ttctdaggct acaccctaga ccaaaccitac gocaaaatcc 714. O attt cricitat cattrittcatc. ggcgtaaatc taacyttctt cccacaacac tittctmggcc trtcc.ggaat gcc cc.gacgt. tactcrgact accoc gatgc atacaccaca toaaayrtcc 726 O tatcatctgt riggytcattc atttctictaa cagoagtaat attaataatt ttcatgatyt 732O gaga agccitt cgcttcraag cgaaaart.cc taatagtaga agaac cotcc ataaacctgg agtgactaya tggatgcc cc ccrcc.ctacc acacattcga agarc cogta tacataaaat 440 ctaracaaaa. aaggaaggaa tog aaccocc caaagytggit ttcaa.gc.caa ccycatggcc 7500 to catgacitt tittcaaaaag rtattagaaa alaccatttica taactttgtc. aaagttaaat 756 O yataggctaa rtccitatata tottaatggc acatgcagor caagtagg to tacaagacgc tacwtc.ccct atcatagaag agctyatyac citttcat gay cacroccitca tarty attitt 768O ccttatctg.c ttyytartcc tgitatgcc ct titt.cctaa.ca citcacaacaa alactalactala 774. O tacyaacatc to agacgcto aggaratra raccgtotga actatoct9.c cc.gc.catcat 7800 cctagtccitc atc.gc.cctcc catccctacg catcc tittac ata acagacg aggtoaayga 786 O yccytcycitt accatcaaat caattggc.ca ccaatggtac tgaaccitacg agtacaccga 7920

Citacgg.cgga citratic titca acticcitayat act tcc.ccca ttattoctag aaccaggcga 798O cctg.cgacitc cittgacgtyg acaatcgagt agtrcticcor attgaarc.cc ccattcgitat 804. O aataattaca to acaagacg tottgcactc atgagctgty cc cacaytag gottaaaaac 8100 US 2005/O123913 A1 Jun. 9, 2005 51

-continued agatgcaatt ccmggacgto taalaccaaac cactittcacc gytacac gro crggrgtata 81 60 citacggtoaa tgctotgaaa totgyggagc aalaccacagy ttcatrc coa togtoctaga 8220 attaattic cc citaaaaatct ttgaaatrgg rccc.gitatty accostatarc accoccitcta cc cc cyctag arc coacygt aaagctaact tag cattaac cittittaagtt aaagattaag 8340 agar ccaa.ca cct citttaca gtgaaatgcc ccalactaaat actaccrtrt groccaccat 84 OO aaty accocc ataytcctta cactattyct catcaccoaa. citaaaaayat taalacacaar 84 60 citaccacyta cyycccitcac caaar cc cat aaaaataaaa. aattataa.ca aaccotgaga 852O accaaaatga acgaaaatct gttcrottca tty attgccc ccrcartcct agg cctrccc gccrcagtac tgatcattct attitc.ccc.cit citattgay.cc ccaccitccala attatctoatc. 864. O aacaa.ccgac taatyaccac ccaacaatga citaatcaaac talaccitcaaa acaaatrata 87 OO rcyayacaya acactaaagg rcgaacct ga toycttatac tagitatcctt aatcatttitt 876O attrocacala citalacctic ct mggrcticcitr ccyyacticat ttacrccaac cacccalacta 882O totataalacc tagccrtroc catcc cctta tgagcrggcr cagtgattat agg cytycgc 888O totaag atta aaaatgcc ct ag.cccactitc ytrocacaag gcacaccyac accocittatc 894 O ccyatactag ttattatcga arc catcago citact cattc. alaccalatago Cotroccgta 9 OOO cgc.cta accg citalacattac tgcaggccac citacticatgc aycta attgg aarcrocacc

Ctag caatat Craccaytaa cct tcc.ctcy accittatca tdytcacaat tctrattotr 912 O citractatoc tagaartcgc tgtc.gc.ctta rtccargcct acgttittcac actyctagta 918O agccitctacc tgcacgacaa cacataatga cccaccaatc. rcatgccitat catatartaa 924 O arccoagyco atgaccccta acrggggcCC tytcagocct cctaatgacc tocc ggyctag ccatgttgatt ycactitccac to cayaacgc to cityatact aggcctricta accaryacac 936 O talaccatata ccalatgrtgg cgc gatgtaa Cac gagaaag cmcataccala ggccaccaca 9420 caccacctgt ccaaaaaggc citt.cgatayg ggatartcct atttattacc tdagaartitt 94.80 ttittctitc.gc aggatttittc tgagc ctityt accactc.cag cctagoccct acco cycaay 954. O taggrggrCa citgroccosa acaggcatca cccdrctaala tocccitagaa rtcccactyc 96.OO taalacacatc cgitattactic gCatCaggag tritcaatcac citgag cycac catagtctaa 9 660 tagaaarcaa cc.gaalaccala aya attcaag cactgctyat tacaattitta citgggtotct 972 O attittaccct cctacaagcc totagagtact togartcitcc cittcaccatt tocgacggca 978O totacggcto aac attittitt gtagccacag gct tccaygg amtwcacgtc attattggct 984 O caacttitcct cactatotgc ttcatcc.gc.c alactaatatt toactittaca toccaaacatc. 9900 actittggctt ygaagcc.gc.c gcc toatact gro attttgt agatgtggity tactayttc 996 O tgitatrtcitc catctaytga tgagggtott acticttittag tataaatagt accgittaact 10 O20 to caattaac tagytttgac aac attcaaa. aaagagtaat aaactitc.gcc ttaattittaa 10080 taatcwacac cctcctagoc ttacitactaa taatyatyac attittgacta coacaactca 101 4 0 ayggctacat rsaaaaatcc. accocittacg artgcggctt csacccitata toccc.crcc.c 10200 gcgtcc ctitt ctic cataaaa. ttcttcttag tagctatyac cittcttatta ttygayctag 10260 aaattgcc ct cottttacco citaccatgag cc.ctacaaac aactaacctr corctaatag 10320 y tatrtcatc cct cittatta atcatcatcc. tagcc citrag totggcctay gagtgacitac 1038O US 2005/O123913 A1 Jun. 9, 2005 52

-continued aaaaaggatt agacitgarcy gaattggtay ataktittaaa caaaacraat gattitcg act 04 40 cattalaatta tgataatcat aty taccalaa tg.cccctcat ttacataaat attatactrg O5 OO cattyacc at citcactitcta ggaatactag tatatogcto acaccitcaitr to:ctic cctac O560 tatgccitaga aggaataata citator citrit toattatago tactcitcaya accotcaiaca O 620 cc cacticcict cittagoyaay attgtrccta ttgccatayt agtyttygcc gcc td.cgaag O 680

Cagcggtrgg cctagoccta citagtctdaa totccalacac atatggccita gacitaygtac Of 40 atalaccitaala cc tact.ccala tgctaaaact aatcgtocca acaattatay try taccact gacrtgacity to caaaaarc acatalatytg aatcaiacaca accaccoaca gcctaattat O 860 tag catcatc ccyctrictat tittittalacca aatyaacaac aacctattta gctgytcccy O920 alaccittitt.cc. to C gacccyc taacaa.cccd. ccitcc taata ctaacy acct g actoctacc ccts acaatc. atggcaa.gc.c arc.gc.cactt atccarygaa ccrctatoac gaaaaaaact O4. O citacct citct a tactaatct cc.ctacaaat citcottartt ataac attca crgccacaga 100 actaatcata titttatatost tottcgaaac cacactitatic cccaccytgr citatcatcac 160

CCgatgrggc arc carycag aacgc.ctgaa cgcaggcaca tactitccitat totay accot 220 agtaggcticc citt.ccc.citac tdatc.gcact ratttayact cacaa.caccc taggcto act 280 a.a.ac.attota citactyacyc toactg.ccca agaactatoa aactcct gag cyaacaactt 34 O aatatgacta gCttacacaa trg cytttat agtaaarata cct citttacg gactic cactt 400 atgactcc ct aaagcc catg togaa.gc.ccc catc.gctggg totaatagtac ttgcc.gcagt 460 acticttraala Ctaggyggct atggtataat acgc.citcaca citcattctda accocctgac aaaa.ca cata gcctaycoct to cittgtact atcccitatga ggcataatta taacaagctic catctgcctr cgacaaacag accitaaaatc. rcticattgca tactcittcaa toagccacat 640 rgcc citcgta gtrrcago.ca ttcticatc.ca aacyc cctoga agcttcaccg gcgcagt cat FOO yctdataatc gcc cacggrc tyacatccitc attactatto tgcctagoaa actcaaacta 760 cgaacgyact cacagtc.gca toataatcct citctoaagga cittcaaactic trctic cc act 820 aatagattitt tgatgactitc tagcaa.gc.ct cgcyaac citc gccttaccoc coactattaa cctrictroga garctic toyg tgctagtarc cacrittcticc tgatcaaata toactictoct 940 actyacrgga citcaiacatrc tarticacarc ccitat acticc citctacatat titaccacaiac acaatgrggc toactica.ccc. accacattaa. caa.cataaaa. cc citcattya cac gagaaaa 2060 cacccitcaitr titcatacacc takccccoat totoctocta toccitcaacc ccg acatcat 2120 yacc gg gttt tocycttgta aatatagittt aaycaaaa.ca toagattgttgart cygacaa 218O cagaggctta cg accocitta tttaccgaga aagct cacaa gaactgctaa citcrtrc cyc 2240 catgtctrac aac atggctt totcaactitt taaaggataa cagot atcca ttggtottag 2300 gcc.ccaaraa ttittggtgca acticcaaata aaagtaataa ccatgyacac tactatarcc 2360 roccitalaccc. trictitcoct aattic coccc. atcct tricca cccitcrttaa cccyaacaaa 2420 aaaaacticat accoccatta tgtaaaatcc attgtc.gcat ccaccittitat tatcagycto 24.80 titcc.ccacala calatattoait rtgcc trgac caagaagtya ttatctoraa citg acactgr 2540 gccaca acco aaaaa. C.C. Ca gctcitcccta agcttcaaac tag actactt citccataata 2600 ttcatccctg trg cattgtt cgttacatgir toyaycatag aattctoact gtgatatata 2660 US 2005/O123913 A1 Jun. 9, 2005 53

-continued aacticagayo craac.attaa. tdagttctitc aartatic tac tdatyttoct aattaccatr 2720 ctaatcttag ttaccgcyaa calaccitatto caactgttca toggctgrga riggcgtagga 2780 attatatoct tottgctdat cagttgat gr tacgc.ccgag crgatgccala cacagoagcc 284 O attcaag car toctatacala cc.gitatcggc gatatoggyt tyatcct cqc cittagcatca 29 OO tittatccitac acticcaactic atgaga.cccw caacaaatar cccttctraa cigctaatcca 2960 agccitcm.ccc Crctac tagg ccticcitcc ta. gCagCagcrg gcaaatcago coaattaggy 3020 citccaccc.cit gacitcc ccto agc catagaa ggc.cccacyc cagtctorgc cctactccac toaa.gcacta tagttgtagc mggrrtctitc titact catcc. gctitccaccc cctarcagaa 314 O aayarccord taatccaaac totalacacta tgcttagg.cg citatcaccac totrittygca 3200 gcagtctg.cg cycttacaca raatgacatc aaaaaaatcg tagccttcto cacttcaagt 326 O carctaggac tdatartagt yacaatcggc atcaiaccaac cacacctago attcctgcac 3320 atctgtacco acgccttctt caaag.ccata citatttatgt gctcc.ggrtc catcatccac 3380 alaccittaa.ca atgaacaaga tatto gaaaa ataggaggac tacticaaaac cataccticts 34 40 actitcaiac ct cc citcaccat tgg cagocta gcattar cag gaatriccittt Cotyacaggb 3500 ttctaytcca argaccacat catcgaalacc gcaaacatat catacacaaa cqcctgagcc 356 O citritictatta citctdatc.gc tacct coctr acargc.gc.ct ayagc acto g rataatyctt 362O citcaccostala Cagg to aacc ycgctitcc cy rocct tact acattaacga aaataac Coc 3 680 accotactala accoccattaa. acgc.ctgroa gcc.ggaa.gc.c trittc.gcagg attycto att 3740 actaacaa.ca titt.cccd.cric atcc.ccctitc. Caaaaa.ca tocccctcy a cotaaaactic 38 OO acrgcc citcg cygtoacyyt cctaggrott cita acagccc tag acctcaa citaccta acc 3860 aacaaactta aaataaaatml cccacyatgc acattittatt totccaacat actmggattc 392 O tacyctwsca to acacaccg cacaatcc cc tatctag sco ttctyric gag ccaaaac citr 398O cc cc tacticc toctag accw aacct gacta gaaaarctay trc cyaaaac alatytcacag 4040 caccalaatct ccaccitccrt catcaccitcd accoaaaaag gcataatyaa actytayttc 41 OO citc.totttct tottcc Crct catcc tar cc citacticcitaa tdacatarcc trittccc.ccg 416 O agcaatyt.ca attacaayat ayacaccaac aaacaatgty carcicagtra cyacyactaa 4220 ycaacgcc.ca tarticataca aagcc.ccc.gc accalatagga to citc.ccgaa tsaac cotga cccytctoct toatalaatta ttcagotyco yacactayya aagtttacca caaccaccac 434 O cc catcatac tott to acco acago accala yccyaccitcc atcs.ctalacc ccactaaaac 4 400 actcaccalag accitcaa.ccc. citg accocca tgccticagga tacticcitcaa tag cyatcro 4 460 tgtagtatay ccaaag acaa ccaycatycic cc.ctaaataa. aytaaaaaaa citattaalacc 4520 catata acct cc.cccaaaat to agaataat a CaCaCCC accacrocirc waacaatcar 4580 tactaar.ccc. ccataaatag gagar ggCtt agaagaaaac cc cacaaacc ccattactaa 4 640 accoacactic aacagaaa.ca aag catayat cattattoto gcacggacta carcicacgac 47 OO caatgatatg aaaaac catc gttgtattitc aactaca aga acaccalatga ccc caatacg 476 O caaaaytarc cc.ccitaataa. aaytaattaa ccrcticaytc atcg accitcc cyacc ccatc 482O caa.catctoc gcatgirtgaa actitcggcto acticcittggc ryctgcctga toctocaaat 488 O caccacagga citattoctag ccatrcacta ytcaccagac gccitcaaccg ccttittcatc 494. O US 2005/O123913 A1 Jun. 9, 2005 54

-continued aatcgc.ccac atcactic gag acgtaaatta y ggstgaayc atcc.gctacc titcacgc.cala 5 OOO tggcgc.ctica atattyttta totgc citctt Cotrocacatc ggrcgaggcc tat attacgg 5060 atcatttcto tacticagaaa cct gaaacat cqg cattatc citcctgcttr carcyatago 5 120 aacago ctitc ataggy tatgtc.citc.ccgtg aggccaaata to attctgag grgccacagt 5 18O aattacaaac titactato.cg ccaycccata cattggrmca gacctag tyc aatgaatstg 5240 aggrggctac toagtaraca rtcccaccct cacacgatto tttacctittc actitcatctt 53OO rccottcatt attgcary cd tarcarcact coaccitccta ttcttrcacg aaacgggrtc 536 O aaacaaccoc citaggaatca cotcc cattc cqataaaatc acctitccacc cittactacac 542O aatcaaag ac roccitcggct trcttctott cmittctotcc ttaatracay taacactatt 54.80 citcaccwgac citcc targcg accolagacaa ttayacc cya gccaa.cccct taaay acco c 554. O tocc cacatc aag coc gaat gatatttcct attc.gc.ctac acaattctoc gatcc.gtc.cc 5 6.OO taacaarcta ggaggcgtcc ttgccytayt act atccatc citcatyctag caataatcc c 566 O yaycotccay atatocaaac aacaaag.cat aatatttcgc ccactaagcc aatcactitta 572O titgroticcita rcc.gcagacc toc torttct aacctgaatc ggaggrdaac cagtaagcta 578O cccytttacc atyattggac aartarcatc crtactatac titcroaacaa toy taatcct 584 O aataccaayt atctoccitaa ttgaaaacaa aatactcaaa toggscotgtc. cittgtagtay 59 OO aaaytartac accagtcttg taarcCrrar aygaaaacyt yytto Caagg acaaatcaga 596 O gaaaaagyct ttaacticcac cattagcacc caaagctaag attctaattit aaactaytct 6O20 citgttcttitc atggggargo agatttgggit rccacco aag tattgacitya yocaycaa.ca 608 O accgcyatgt atytcgtaca ttact gcyag yoamcatgaa tatygyacvg taccataaay 614 O actyrayyac citritagtaca trmaamyyya ryc cryatca ammyyyyvyc cyyatgctta 62OO caag cargya crryaaycra coyycarcyr yyayr catya ryygyar cyc caamryyrcy 6260 yctymycyay yagratayca acarasyyay yyrycytyaa cagyacatro yacatrww.ry 632O catyyrycgt acatagdaca ttryagtcaa atcyyyycity gy.cccyaygg atgaccc.ccc 6,380 toagataggr ritcc.cittgro cac catcc to cqtgaaatca atatoccgca caagagtrmit 6 440 actcitcct cq citc.cggg.ccc ataac acttg g g g g tagcta aartgaactg. tatcc gacat 6500 citggttccita cittcagggyo ataaagycta aatagoccac acgttcc cct taaataagac 656 O atcacgatg 6569

<210> SEQ ID NO 2 &2 11s LENGTH 16569 &212> TYPE DNA <213> ORGANISM: Homo sapiens &220s FEATURE <221 NAME/KEY: misc feature <222> LOCATION: (3106) ... (3106) <223> OTHER INFORMATION: n at 3106 is a deletion

<400 SEQUENCE: 2 gatcacaggit citatcaccct attalaccact cacgg gagct citc catgcat ttgg tattitt 60 cgtotggggg gtatgcacgc gatago attg cqagacgctd gag coggagc accotatgtc. 120 gcagtatctg. tctittgattc citgccitcatc ctattattta togcaccitac gttcaatatt 18O acaggc gaac atact tacta aagtgttgtta attaattaat gcttgtagga cataataata 240 US 2005/O123913 A1 Jun. 9, 2005 55

-continued acaattgaat gtctgcacag coactittcca cacagacatc ataacaaaaa attitccacca 3OO aaccoccoct coccc.gcttctggccacago acttaaacac atctotgcca aaccocaaaa 360 acaaagaacc ctaacaccag cottalaccaga tittcaaattt tat cittittgg cqg tatgcac 420 ttittaacagt caccocccaa citaacacatt atttitcc cct cocactccca tactactaat 480 citcatcaata caaccoccgc ccatcctacc cagdacacac acaccgctgc taaccocata 540 cc.ccgalacca accaaaccoc aaaga caccc cccacagttt atgtagotta cotcc to aaa 600 gcaatacact gaaaatgttt agacgggcto a catcacccc ataaacaaat aggtttggto 660 citagccttitc tattagctct tagtaagatt acacatgcaa goatcc.ccgt to cagtgagt 720 to accotcta aatcaccacg atcaaaagga acaag catca agcacgcago: aatgcagotc 78O aaaacgctta gcc tag ccac accoccacgg gaalacagoag to attalacct ttagcaataa 840 acgaaagttt alactaagcta tacta accoc agg gttggto: aattitcgtgc cagocaccgc 9 OO gg to acacga ttalacc caag to aatagaag ccggcgtaaa gagtgttitta gatcaccc.cc 96.O tocc caataa agctaaaact caccitgagtt gtaaaaaact coagttgaca caaaatagac O20 tacgaaagtg gctittaa.cat atctgaacac acaatagota agacccaaac toggattaga O8O taccc.cacta to cittagc.cc taalaccitcaa cagittaaatc aacaaaact g citc.gc.cagaa 14 O cactacgagc cacagottaa aactcaaagg acctggcggit gcttcat atc cct citagagg 200 agcctgttct gtaatcgata aaccocq atc aacct cacca cctottgctic agcctatata 260 cc.gc.catctt cagoaaacco to atgaaggc tacaaagtaa go.gcaagtac coacgtaaag 320 acgttagg to aaggtgtagc ccatgaggtg gcaagaaatg ggctacattt totaccc.ca.g 38O aaaactacga tagcc.cittat gaaacttaag g g togalaggt ggatttagca gtaaactaag 4 40 agtagagtgc titagttgaac agggccct ga agcgc.gtaca caccgc.ccgt cacccitcctic 5 OO aagtatacitt caaaggacat ttalactaaaa coccitacgca tittatataga ggaga caagt 560 cgta acatgg taagtgtact ggaaagtgca cittggacgaa ccagagtgta gotta acaca 62O aag cacccaa cittacactta ggagatttica actta acttg accgctotga gctaalaccita 680 gcc.ccaaacc cactccacct tactaccaga caaccittagc caaac cattt accoaaataa 740 agtatagg.cg atagaaattgaaacctgg.cg caatagatat agtaccgcaa goggaaagatg 800 aaaaattata accaag cata atatagoaag gacta accoc tatacct tct gcataatgaa 860 ttalactagaa atalactittgc aag gagagcc aaagctaaga ccc.ccgaaac Cagacgagct 920 acctaagaac agctaaaaga gcacaccc.gt citatgtagca aaatagtggg aagatttata 98O ggtagaggcg acaaacctac cqagcctggit gatagotggit totccaagat agaatcttag 20 40 ttcaactitta aatttgcc.ca cagaaccotc taaatcc cct totaaattta actgttagtc 2100 caaagaggaa cagotcitttg gacac tagga aaaaaccttg tagagagagt aaaaaattta 216 O acac coatag taggcc-taaa agcagocacc aattaagaaa gogttcaagc ticaac acco a 2220 citacctaaaa aatcccaaac atataactga acticcitcaca cocaattgga ccaatctato 228O accotataga agaactaatg ttagtataag talacatgaaa acattcticct cog cataagc 234. O citgcgtcaga ttaaaacact gaactgacaa tta acago.cc aatat citaca atcaiaccaac 24 OO aagt catt at taccct cact gtcaa.cccaa cacaggcato citcataagga aaggittaaaa 2460 aaagtaaaag gaactcggca aatcttaccc cqcctgttta ccaaaaa.cat caccitctago 252O US 2005/O123913 A1 Jun. 9, 2005 56

-continued atcaccagta ttagaggcac cgcct gcc.ca gtgacacatg tittaacggcc gcggtaccot 258O aaccgtgcaa aggtag cata atcacttgtt ccittaaatag ggacctgitat gaatggctoc 264 O acgagggttc agctgtctdt tacttittaac cagtgaaatt gacCtgc.ccg toga agaggcg 27 OO ggcatalacac agcaagacga. gaag accota tggagctitta atttattaat gcaaacagta 276 O. coctaacaaac ccacaggtoc taalactacca aacctgcatt aaaaattitcg gttgggg.cga 282O ccitcggagca gaacco aacc to C gag cagt acatgctaag actitcaccag toaaag.cgaa 2880 citactatact caattgatcc aataacttga ccaacggaac aagttaccct agggataa.ca 2.940 gc gcaatcct attctagagt ccatatosaac aatagggittt acg acct cqa tottggatca ggacatcc.cg atggtgcago cgctattaaa ggttcgtttg ttcaacgatt aaagttccitac 3060 gtgatctgag ttcagaccgg agtaatccag gtoggitttct atctanctitc. aaatticcitcc. 312 O citgitacgaaa ggaCaagaga aataaggcct actitcacaaa. gc gcc titccc cc.gtaaatga 318O tatcatctoa actitag tatt attacccacac ccacccalaga acagg gtttgttalagatggc 324 O agagc.ccggit aatc.gcataa aacttaaaac tttacagtca gaggttcaat tccitcttctt aacaa.catac ccatggccala ccitcc tactic citcattgitac ccattctaat cqcaatggca 3360 titcctaatgc ttaccgaacg aaaaatticta ggctatatac aactacgcaa aggc.cccaac 342O gttgtagg cc ccitacgggct actacaa.ccc. titcgctgacg ccataaaact ctitcaccalaa. 3480 gag.cccCtaa aac Cogccac atctaccatc. accotctaca to accqCCCC gaccttagct 354. O citcaccatcg citottctact atgaaccocc citc.cccatac ccaaccocct g g toaacctc 3600 aaccitagg cc toctattitat totagccacc totagoctag cc.gtttactic aatccitctga 3660 tdagggtgag catcaaactic aaactacgcc citgatcggcg cactg.cgagc agtagcc cala 372 O acaatctoat atgaagttcac cctag coatc attctactat caa.cattact aataagtggc 378 O. toctittalacc totccaccot tat cacaa.ca caagaac acc totgattact cotgccatca 384 O tgaccottgg ccataatatg atttatctoc acactagoag agaccaa.ccg aaccoccittc 39 OO gaccttgcc.g aaggggagtC cgaac tag to to aggcttca acatc gaata cqc.cgcaggc 396 O ccctitc.gc.cc tattottcat agc.cgaatac acaaac atta ttataataala cacccticacc 4020 actacaatct toctaggaac aacatatgac gcactictocc citgaacticta cacaacatat 408 O tttgtcacca agaccotact totalaccitcc. citgttctitat gaatticgaac agcataccc.c 414 O cgattacgct acgaccaact catacaccitc. citatgaaaaa act tcc tacc actica.cccita 4200 gcattactta tatgatatgt ctic cataccc. attacaatct ccago attcc cccitcaaacc 4260 taagaaatat gtotgataaa agagttacitt tgatagagta aataatagga gotta aacco 4320 cottatttct aggacitat ga gaatc galacc catcc ct gag aatccaaaat tctocqtgcc 4.380 accitat caca ccc.catcc ta. aagta aggto agctaaataa gctatogggc ccataccc.cg 4 440 aaaatgttgg ttatacccitt ccc.gtactaa ttaatcc cct ggcc.caa.ccc gtcatctact 4500 citaccatctt tgcagg caca citcatcacag cgctaag citc gcactgattt tttacct gag 45 60 tagg cctaga aataaacatg ctagotttta titccagttct alaccaaaaaa ataalacc citc gttccacaga agctgc catc aagtatttcc to acgcaa.gc aaccq catcc ataatcctitc 4680 taatag citat cct cittcaac aatatactict cc.ggacaatg alaccatalacc aatactacca 474. O atcaatactic atcattaata atcataatag citatagdaat aaaac tagga atagocc cct 4800 US 2005/O123913 A1 Jun. 9, 2005 57

-continued ttcacttctg agtc.ccagag gttacccaag goaccc.citct gacatcc.ggc citgcttcttic 4860 tdacatgaca aaaactag co cocatctoaa toatatacca aatctotccc toactaaacg 4920 taagccttct cotcactcitc. tcaatctitat coatcatago agg cagttga ggtggattaa 4.980 accaaaccoa gctacgcaaa atcttagcat acticcitcaat tacco acata ggatgaataa 5040 tag cagttct accgtacaac cctaacataa ccattcttaa tittaactatt tatattatcc 51OO taactactac cqcattccta citactcaact taaactccag caccacg acc ctactactat 5 160 citc.gcaccitg aaacaagcta acatgactaa caccottaat tccatccacc citcctcitccc 5220 taggaggcct gcc.ccc.gcta accggcttitt tocccaaatg ggc cattatc gaagaattca 528 O caaaaaacaa tag cotcatc atccccacca toatagocac catcaccctc. cittaaccitct 5340 acttctacct acgcctaatc tactccacct caatcacact acticcccata totaacaacg 5 400 taaaaataaa atgacagttt galacatacaa aacco accoc attcc tocc c acact catcg 546 O cccttaccac got acticcita cottatctocc cittittatact aataatctta tagaaattta 552O ggittaaatac agaccalagag cottcaaagc cct cagtaag ttgcaatact taatttctgt 558 O aacagotaag gacitgcaaaa ccc cactctg catcaactga acgcaaatca gcc actittaa 5640 ttaa.gctaag ccct tactag accalatggga cittaa accoa caaac actta gttaa.ca.gct 5700 aag cacccita atcaactggc titcaatctac ttctoccgcc gcc.gggaaaa aaggcgg gag 576 O. aag.ccc.cggc aggtttgaag Ctgcttctitc gaatttgcaa ttcaatatga aaatcaccitc 582O ggagctggta aaaagaggcc talacc cct gt ctittagattt acagtccaat gottcactca 588 O gccattttac citcacccc.ca citgatgttcg ccg accqttg actattotct acaaaccaca 594 O aaga cattgg alacactatac citatt attcg gcgcatcago toggagtc.cta ggcacagotc 6 OOO taagccitcct tattogagco gagctgggcc agc.caggcaa ccttctaggit aac gaccaca 6060 totacaacgt tatcgtcaca gcc catgcat ttgtaataat cittcttcata gtaatacco a 61.20 toataatcgg aggctttggc aactoactag titc.ccctaat aatcggtgcc ccc.gatatgg 618O cgttitccc.cg cataaacaac ataagcttct g actottacc toccitctdtc citacticcitgc 624 O togcatctgc tatagtggag gocggagcag galacaggttgaac agtctac cct coctitag 6300 Cagg galacta citc.ccaccct ggagc citc.cg tag acctaac catcttctoc ttacaccitag 6360 caggtotcitc citctatotta gggg.ccatca atttcatcac aacaattatc aatataaaac 642O cccctgcc at aaccoaatac caaacgc.ccc tottcgtotg atcc.gtocta atcacagoag 64.80 toctacttct cotatotcitc ccagtcctag citgctgg cat cactatacta citaacagacc 654. O gcaaccitcaa caccacct to titc gaccc.cg ccggaggagg agaccc.catt citataccaac 6600 acctattotg atttittcggit caccotgaag tittatattot tatcctacca ggctitcggaa 6660 taatctocca tattgtaact tactactc.cg gaaaaaaaga accatttgga tacataggta 672O tggtotgagc tatgat atca attggctitcc tagggittitat cqtgtgagca caccatatat 678 O. ttacagtagg aatagacgta gacacacgag catattt cac citcc.gctaccataatcatcg 6840 citat coccac cqgcgtcaaa gtatttagct gacitc.gc.cac acticcacgga agcaatatga 69 OO aatgatctgc tigcagtgcto tdagcc.ctag gatto atctt totttitcacci gtaggtggcc 696 O tgactogcat totattagca aactcatcac tag acatcgt act acacgac acgtactacg 7 O2O ttgtagcc.ca citt.ccactat gtccitatcaa taggagctgt atttgccatc ataggaggct 708O US 2005/O123913 A1 Jun. 9, 2005 58

-continued to attcactg attitcc ccta ttctdaggct acaccctaga ccaaacctac gocaaaatcc 714. O atttcactat catattocatc gg.cgtaaatc taactittctt cocacaacac tittctdggcc 72OO tatc.cggaat gcc.ccgacgt tactcggact accoc gatgc atacaccaca toaaa.catcc 726 O tatcatctgt aggctcattc atttctotaa cagoagtaat attaataatt ttcatgattit 732O gaga agccitt cqctitcgaag cqaaaagttcc taatagtaga agaac cotcc ataaacct gg 738O agtgactata toggatgcc cc ccaccctacc acacattcga agaac cogta tacataaaat 440 citagacaaaa aaggaaggaa togaaccocc caaagctggit ttcaa.gc.caa ccc catggcc 7500 to catgacitt tttcaaaaag gitattagaaa aaccatttca taactttgtc. aaagttaaat 756 O tataggctaa atcctatata tottaatggc acatgcagog caagtagg to tacaagacgc 762O tactitcccct atcatagaag agcttatcac ctittcatgat cacgc.ccitca taatcattitt 768O ccttatctgc titcctagtcc totatgccct tittcctaaca citcacaacaa aactaactaa 774. O tactaa.catc. tcagacgcto aggaaataga aaccqtctga act atcc toc cc.gc.cat cat 7800 ccitagt cotc atc.gcc citcc catcc citacg catcc tttac atalacagacg aggtoaacga 786 O toccitc.ccitt accatcaaat caattggc.ca ccaatggtac toga accitacg agtacaccga 7920 citacgg.cgga citaatctt.ca acticcitacat actitcc.ccca ttattoctag aaccaggcga 798O cctg.cgacitc cittgacgttg acaatcgagt agtactc.ccg attgaag.ccc ccattcgitat 804. O aataattaca toacaagacg tottgcactc atgagctgtc. CCC acattag gCttaaaaac 8100 agatgcaatt cocggacg to taalaccaaac cactittcacc gctacacgac cqggggtata 81 60 citacgg to aa togctotgaaa totgtggagc aaaccacagt titcat gcc.ca togtoctaga 8220 attaatticco citaaaaatct ttgaaatagg gcc.cgitattt accotatago accoccitcta 828O cc.ccctictag agcc.cactgt aaagctaact tag cattaac Cittittaagtt aaagattaag 8340 agaaccaa.ca cctotttaca gtgaaatgcc ccalactaaat act accgitat gg.cccaccat 84 OO aattaccc.cc atacticcitta cactattoct catcaccoaa citaaaaatat taalacacaaa. 84 60 citaccaccita cctoccitcac caaag.cccat aaaaataaaa aattataa.ca aaccotgaga 852O accaaaatga acgaaaatct gttcgctt.ca ttcattgccc ccacaatcct aggccitaccc 858O gcc.gcagtac to atcattct attitccccct citattgatcc ccaccitccaa atatotcatc 864. O aacaa.ccgac taatcaccac coaacaatga citaatcaaac talaccitcaaa acaaatgata 87 OO accatacaca acactaaagg acgaacctga totcittatac tagtatcctt aatcatttitt 876O attgccacaa citaaccitcct cqg actoctd cotcactcat ttacaccaac caccoaacta 882O totataalacc tagccatggc catcc cctta tag.cgggca cagtgattat aggctitt.cgc 888O totaag atta aaaatgccct ag.cccacttic titaccacaag goacaccitac accocittatc 894 O cc catactag titattatcga aaccatcago citact cattcaac caatago Cotggcc.gta 9 OOO cgc.cta accg citaacattac toc aggccac citact catgc acctaattgg aag.cgccacc 9 O60 ctag caatat caaccattaa cctitccctct acacttatca tottcacaat tctaatticta 912 O citgactatoc tagaaatc.gc tigtogccitta atccaa.gcct acgttittcac acttctagta 918O agccitctacct gcacgacaa cacataatga cccaccaatc acatgccitat catatagtaa 924 O aaccoagc.cc atgaccc.cta acagggg.ccc totcagocct cotaatgacc tocc ggcc tag 93OO ccatgttgatt toactitccac tocataacgc ticcitcatact aggcc tacta accaacacac 936 O US 2005/O123913 A1 Jun. 9, 2005 59

-continued talaccatata ccaatgatgg cgc gatgtaa Cac gagaaag cacataccala ggccaccaca 9420 caccacctgt ccaaaaaggc citt.cgatacg ggataatcct atttattacc tdagaagttt 94.80 ttittctitc.gc aggatttittc tgagccttitt accactc.cag cctag cocct accocccaat 954. O taggagggca citggcc.ccca acaggcatca cc.ccgctaaa tocccitagaa gtc.ccactcc 96.OO taalacacatc cgitattactic gCatCaggag tat caatcac citgagct cac catagtctaa 9 660 tagaaaacaa cc.gaalaccala ata attcaag cactgctitat tacaattitta citgggtotct 972 O attittaccct cctacaagcc totagagtact to gag totcc cittcaccatt tocgacggca 978O totacggcto aac attittitt gtagccacag gct tccacgg actitcacgtc attattggct 984 O caacttitcct cactatotgc ttcatcc.gc.c alactaatatt toactittaca toccaaacatc. 9900 actittggctt cgaagcc.gc.c gcc toatact ggcattttgt agatgtggitt tact atttic 996 O tgitatgtcto catctattga tgagggtott acticttittag tataaatagt accgittaact OO20 to caattaac tagttittgac aac attcaaa. aaagagtaat aaactitc.gcc ttaattittaa taatcaa.cac cctcctagoc ttacitactaa taattattac attittgacta coacaactca O 140 acggctacat agaaaaatcc accocittacg agtgcggctt cg accotata tocccc.gc.cc gcgtcc ctitt ctic cataaaa. ttcttcttag tagctattac cittcttatta tittgatctag O260 aaattgcc ct cottttacco citaccatgag cc.ctacaaac aactalacct g c cactaatag O320 titatgtcatc cct cittatta atcatcatcc. tagcc.ctaag totggccitat gagtgacitac aaaaaggatt agactgaacc gaattggitat atagtttaaa caaaacgaat gattitcg act 04 40 cattalaatta tgataatcat atttaccalaa. tg.cccctcat ttacataaat attatactag O5 OO catttaccat citcactitcta ggaatactag tatatogcto acaccitcaita toctic cctac O560 tatgccitaga aggaataata citatc.gctgt toattatago tacticitcaita accostcaa.ca O 620 cc cacticcict cittagccaat attgttgccta ttgccatact agt ctittgcc gcc td.cgaag O 680

Cagcggtggg cctagoccta citagtctdaa totccalacac atatggccita gacitacgtac Of 40 atalaccitaala cc tact.ccala tgctaaaact aatcgtocca acaattatat tactaccact gacatgacitt to caaaaaac acataatttg aatcaiacaca accaccoaca gcctaattat O 860 tag catcatc cctic tactat tittittalacca aatcaacaac aacctattta gctgttcccc O920 alaccittitt.cc. tdcg accocc taacaa.cccd. ccitcc taata ctaactacct g actoctacc ccticacaatc. atggcaa.gc.c aacgc.cactt atc.ca.gtgaa ccactatoac gaaaaaaact O4. O citacct citct a tactaatct cc.ctacaaat ctic cittaatt ataac attca cagocacaga 100 actaatcata titttatatost tottcgaaac cacactitatic cccaccittgg citatcatcac 160

CCgatgaggc alaccago cag aacgc.ctgaa cgcaggcaca tact tccitat totacaccct 220 agtaggcticc citt.ccc.citac tdatc.gcact aatttacact cacaa.caccc taggcto act 280 a.a.ac.attota citacticactic toactg.ccca agaactatoa aactcct gag ccaacaactt 34 O aatatgacta gottacacaa tagcttittat agtaaagata cctcitttacg gacitccactt 400 atgactcc ct aaagcc catg togaa.gc.ccc catc.gctggg totaatagtac ttgcc.gcagt 460 acticittaaaa. citagg.cggct atggtataat acgc.citcaca citcattctda accocctgac aaaa.ca cata gcctaccoct to cittgtact atcccitatga ggcataatta taacaagctic catctgccta cgacaaacag accitaaaatc. gct cattgca tactcittcaa toagccacat 640 US 2005/O123913 A1 Jun. 9, 2005 60

-contin ued agcc citcgta gta acago.ca ttcticatc.ca aac coccitga agcttcaccg gc gcagt cat 17 OO totcaitaatc gCC CacgggC ttacatcc to attactatto tgcctagdaa actcaaacta 1760 cgaacgcact cacagtc.gca toataatcct citctoaagga cittcaaactic tactic cc act 1820 aatagattitt tgatgactitc tagcaa.gc.ct cgctaaccto gccttaccoc ccact attaa. 1880 cc tactggga gaactcitctg tgctagtaac cacgttctoc tgatcaaata toacticitoct 1940 acttacagga citcaiacatac tag to acago ccitat acticc citctacatat ttaccacaiac acaatggggc toactica.ccc. accacattaa. caa.cataaaa. cc citcattca cac gagaaaa 2060 caccctcatg titcatacacc tat cocccat totoctocta toccitcaiacc ccg acatcat 2120 taccgg gttt toctottgta aatatagittt aac Caaaa.ca toagattgtg aatctgacaa 218O cagaggctta cg accocitta tttaccgaga aagct cacaa gaactgctaa citcatg.cccc 2240 catgtctaac aac atggctt totcaactitt taaaggataa cagotatoca ttggtottag 2300 gcc.ccaaaaa ttittggtgca acticcaaata aaagtaataa ccatgcacac tactatalacc 2360 accota acco tgactitcc ct aattic coccc. atcc titacca cccitcgittaa coctaacaaa. 2420 aaaaacticat accoccatta tgtaaaatcc attgtc.gcat ccaccittitat tat cagtcto 24.80 titcc.ccacala calatattoait gtgccitagac caagaagtta ttatctogaa citgacactda 2540 gccaca acco aaaaa. C.C. Ca gctcitcccta agcttcaaac tag actactt ctic catalata 2600 ttcatc.cctg tag cattgtt cgttacatgg to catcatag aattcticact gtgatatata 2660 aacticagacc caaac attaa. tdagttctitc aaata totac tdatc.ttoct aattaccata 2720 ctaatcttag ttaccgctaa calaccitatto caactgttca toggctgaga ggg.cgtagga 2780 attatatoct tottgctdat cagttgatga tacgc.ccgag cagatgccala cacagoagcc 284 O attcaa.gcaa toctatacala cc.gitatcggc gatatoggtt tdatcctc.gc cittagcatca 29 OO tittatccitac acticcaactic atgaga.ccca caacaaatag cc cittctaaa. cgctaatcca 2960 agccitcacco cactac tagg ccticcitcc ta. gCagCagcag gcaaatcago cca attaggit 3020 citccaccc.cit gacitcc ccto agc catagaa ggc.cccaccc cagtc.tcago cc tacticcac toaa.gcacta tagttgtagc aggaatctitc titact catcc. gctitccaccc cctagoagaa 314 O aatagoccac taatccaaac totalacacta tgcttagg.cg citaticaccac totgttc.gca 3200 gcagtctg.cg cc cittacaca aaatgacatc aaaaaaatcg tagccttcto cacttcaagt 326 O caac taggac tdataatagt tacaatcggc atcaiaccaac cacaccitagc attcctgcac 3320 atctgtacco acgccttctt caaag.ccata citatttatgt gcticcgggto catcatccac 3380 alaccittaa.ca atgaacaaga tatto gaaaa ataggaggac tacticaaaac catacct citc 34 40 actitcaiac ct cc citcaccat tgg cagocta gcattagcag gaatacctitt cct cacaggt 3500 ttctacticca aag accacat catcgaalacc gcaaacatat catacacaaa. cgc.ctgagcc 356 O citatictatta citctdatc.gc tacctccctg acaag.cgc.ct atago actog aataatticitt 362O citcaccostala Cagg to aacc togctitcc cc accottacita acattaacga aaata accoc 3 680 accotactala accoccattaa. acgc.ctggca gcc.ggaa.gc.c tattogcagg atttcticatt 3740 actaacaa.ca tittccc.ccgc atcc.ccctitc. Caaaaaaa. toccc.cticta ccitaaaactic 38 OO acagoc citcg citgtcactitt ccitaggacitt cita acagccc tag acct caa citacctalacc 3860 aacaaactta aaataaaatc cccactatgc acattittatt totccalacat acticggattic 392 O US 2005/O123913 A1 Jun. 9, 2005 61

-continued taccctagoa toacacaccg cacaatcc cc tat citaggcc ttcttacgag ccaaaacctg 398O ccccitactcc toctag acct aacctgacta gaaaagctat tacctaaaac aattitcacag 4040 caccaaatct coacctccat catcaccitca accoaaaaag goataattaa actttactitc 41 OO citctotttct tctitcc cact catcctaacc ctacticcitaa toacata acc tattocc cc.g 416 O agcaatctoa attacaat at atacaccaac aaacaatgtt caaccagtaa citact actaa 4220 tdaacgcc.ca taatcataca aagcc.ccc.gc accalatagga to citc.ccgaa totalaccctga 428O ccccitctdct tcataaatta titcagottcc tacactatta aagtttacca caaccaccac 434 O cccatcatac totttcacco acagdaccaa toctacctoc atc.gcta acc ccactaaaac 4 400 actcaccalag accitcaa.ccc citg accocca tocct cagga tactccticaa tagccatc.gc 4 460 tgtagtatat coaaag acaa ccatcattcc ccctaaataa attaaaaaaa citattaaacc 4520 catata acct cocccaaaat tdagaataat a acacaccc.g. accacaccgc taacaatcaa 4580 tactaa acco coataaatag gagaaggott agaagaaaac cccacaaacc ccattactaa 4 640 accoacactc aacagaaa.ca aag catacat cattattoto goacggacta caaccacgac 47 OO caatgatatgaaaaac catc gttgtatttcaactacaaga acaccalatga ccc caatacg 476 O caaaactaac coccitaataa aattaattaa ccactcattc atcg accitcc ccaccccatc 482O caa.catctoc goatgatgaa actitcggcto acticcittggc gcc toccitga toctocaaat 488 O Caccacagga Citattoctag ccatgcacta Citcaccagac goctoaaccq CCttittcatC 494. O aatcgc.ccac atcactic gag acgtaaatta togctgaatc atcc.gctacc titcacgc.cala 5 OOO tggcgc.ctica atattottta totgc citctt Cotacacatc ggg.cgaggcc tat attacgg 5060 atcatttcto tacticagaaa cct gaaacat cqg cattatc citcctgcttg caactatago 5 120 aacago ctitc ataggctato tcc toccgtg aggccaaata to attctgag g g gccacagt 5 18O aattacaaac titactato.cg ccatc.ccata cattgggaca gaccitagttcaatgaatctg 5240 aggaggctac toagtagaca gtc.ccaccct cacacgatto tttacctittc actitcatctt 53OO gcc.citt catt attgcagocc tagcaa.cact coaccitccta ttcttgcacg aaacgggatc 536 O aaacaaccoc citaggaatca cotcc cattc cqataaaatc acctitccacc cittactacac 542O aatcaaagac gocctdggct tacttctott cottctotcc ttaatgacat taacactatt 54.80 citcaccagac citcc tagg.cg accolagacaa ttatacccta gccaa.cccct taalacacco c 554. O tocc cacatc aag coc gaat gatatttcct attc.gc.ctac acaattctoc gatcc.gtc.cc 5 6.OO taacaaacta ggagg.cgtcc ttgcc ctatt actatocatc citcatcc tag caataatccc 566 O catcctccat atatocaaac aacaaag.cat aatatttc.gc ccactaagcc aatcactitta 572O ttgacticcita gcc.gcagacc toc to attct aacctgaatc ggaggacaac cagtaagcta 578O cccttttacc atcattggac aagtag catc cqtactatac titcacaacaa toctaatcct 584 O aataccaact atctoccitaa ttgaaaacaa aatactcaaa toggcctgtc. cittgtagtat 59 OO aaactaatac accagt cittg taalaccggag atgaaaacct tttitccalagg acaaatcaga 596 O gaaaaagttct ttaactccac cattagdacc caaagctaag attctaattit aaactattot 6O20 citgttcttitc atggggaagc agatttgggit accacco aag tattgactica cocatcaa.ca 608 O accgctatot attitcgtaca ttact gccag ccaccatgaa tattgtacgg taccatalaat 614 O acttgaccac citgtag taca taaaaaccoa atccacatca aaaccoccitc cccatgctta 62OO US 2005/O123913 A1 Jun. 9, 2005 62

-continued caa.gcaagta cagdaatcaa cccitcaacta toacacatca actgcaactic caaag.ccacc 16260 cctd accoac taggatacca acaaacctac ccacccittaa cagtacatag tacataaag c 16320 catttaccgt acatagdaca ttacagtcaa atcccittcto gtc.cccatgg atgaccoccc 16380 toagatagg g g toccittgac cac catcc to cqtgaaatca atatoccgca caagagtgct 16440 actcitcct cq citc.cggg.ccc ataac acttg g g g g tagcta aagtgaactg. tatcc gacat 16500 citggttccta cittcaggg to ataaag.ccta aatag cocac acgttcc cct taaataagac 16560 atcacgatg 16569

1-81. (canceled) i) haplogroup J wherein method step b) comprises iden 82. A method for diagnosing a haplogroup of a human tifying in Said Sample at least one nucleotide allele comprising: selected from the group consisting of 295T, 12612G, 13708A, and 16069T. a) providing a sample comprising mitochondrial nucleic 84. The method of claim 82 wherein said haplogroup is acid from Said human; and haplogroup B and wherein method step b) comprises: b) identifying, in Said sample, the presence or absence of 1) identifying in said sample nucleotide allele 16189C; at least one nucleotide allele diagnostic of a haplo group, Said at least one nucleotide allele Selected from 2) identifying in Said Sample the absence of a nucleotide the group consisting of alleles listed in Table 3. allele Selected from the group consisting of 1719A, 83. The method of claim 82 wherein said haplogroup is 3516G, 6221C, 14470C, and 16278T, and Selected from the group consisting of: identifying in Said Sample the absence of a nucleotide allele Selected from the group consisting of 1888A, a) haplogroup A wherein method step b) comprises iden 4216C, 4917G, 8697A, 10463C, 11251G, 11467G, tifying in Said Sample at least one nucleotide allele 12308G, 12372A, 12633T, 13104G, 13368A, 14070G, selected from the group consisting of 663G, 16290T, 14905A, 15452A, 15607G, 15928A, 16126C, 16163C, and 16319A; 16186T, 16249C, and 16294T. b) haplogroup C wherein method step b) comprises iden 85. The method of claim 82 wherein said haplogroup is tifying in Said Sample at least one nucleotide allele Selected from the group consisting of: selected from the group consisting of 3552C, 4715G, a) haplogroup T wherein method step b) comprises iden 7196A, 8584A,954.5G, 13263G, 14318C, and 16327T, tifying in Said Sample at least one nucleotide allele c) haplogroup D wherein method step b) comprises iden selected from the group consisting of 11812G, 12633T, tifying in Said Sample at least one nucleotide allele 14233G, 16163C, 16186T, 1888A, 4917G, 8697A, selected from the group consisting of 4883T, 5178A, 10463C, 13368A, 14905A, 15607G, 15928A, and 8414T, 14668T, and 15487T, 16294T, d) haplogroup E wherein method step b) comprises iden b) haplogroup U wherein method step b) comprises tifying in said sample the nucleotide allele 16227G; identifying in Said Sample at least one nucleotide allele selected from the group consisting of 3197C, 4646C, e) haplogroup F wherein method step b) comprises iden 7768G, 9055A, 11332T, 13104G, 14070G, 15907G, tifying in Said Sample at least one nucleotide allele 16051G, 16129C, 16172C, 16219G, 16249C, 16270T, Selected from the group consisting of 12406A and 16311T, 16318T, 16343G, and 16356C; 16304C; c) haplogroup V wherein method step b) comprises iden f) haplogroup G wherein method step b) comprises iden tifying in Said Sample at least one nucleotide allele tifying in Said Sample at least one nucleotide allele selected from the group consisting of 72C, 4580A, and selected from the group consisting of 4833G, 8200C, 15904T, and 16017C; d) haplogroup W wherein method step b) comprises g) haplogroup H wherein method step b) comprises identifying in Said Sample at least one nucleotide allele identifying in Said Sample at least one nucleotide allele selected from the group consisting of 204C, 207A, Selected from the group consisting of 2706A and 1243C, 5046A, 5460A, 8994A, 11947G, 15884C, and 7028C; 16292T; h) haplogroup I wherein method step b) comprises iden e) haplogroup X wherein method step b) comprises iden tifying in Said Sample at least one nucleotide allele tifying in Said Sample at least one nucleotide allele selected from the group consisting of 4529T, 10034C, selected from the group consisting of 1719A, 35.16G, and 16391A; and 6221C, and 14470C; US 2005/O123913 A1 Jun. 9, 2005

f) haplogroup Y wherein method step b) comprises iden group consisting of ND1, ND2, ND3, ND4, ND5, ND6, tifying in Said Sample at least one nucleotide allele Cytb, COI, COII, COIII, ATP6, and ATP8. selected from the group consisting of 7933G, 8392A, 92. The method of claim 88 also comprising identifying at 1623 1C, and 16266T, and least one evolutionarily Significant nucleotide allele by iden g) haplogroup Z, wherein method step b) comprises iden tifying a Sequence difference between Said first and Second tifying in Said Sample at least one nucleotide allele nucleotide Sequences. selected from the group consisting of 11078G, 16185T, 93. The method of claim 92 also comprising identifying and 16260T. an evolutionarily significant amino acid allele by determin 86. The method of claim 82 wherein said haplogroup is ing the evolutionarily significant amino acid allele encoded Selected from the group consisting of: by the codon comprising Said evolutionarily significant nucleotide allele. a) haplogroup L0 wherein method step b) comprises 94. The method of claim 93 also comprising identifying identifying in Said Sample at least one nucleotide allele an amino acid allele diagnostic of a predisposition to a selected from the group consisting of 4586C, 98.18T, physiological condition by using as Said first population, and 8113A, individuals having Said physiological condition, and using b) haplogroup L1 wherein method step b) comprises as the Second population, individuals not having Said physi identifying in Said Sample at least one nucleotide allele ological condition. selected from the group consisting of 825A, 2758A, 95. A method for diagnosing an individual with a predis 2885C, 7146G, 8468T, 8655T, 10688A, 10810C, and position to a Selected physiological condition comprising: 13105G; c) haplogroup L2 wherein method step b) comprises a) providing a sample comprising mitochondrial nucleic identifying in Said Sample at least one nucleotide allele acid molecule from an individual; selected from the group consisting of 2416C, 2758G, b) providing information identifying the geographic 8206A, 9221G, 11944C, and 16390G; and region in which said individual resides; d) haplogroup L3 wherein method step b) comprises c) providing information identifying a set of haplogroups identifying in Said Sample at least one nucleotide allele native to Said geographic region; selected from the group consisting of 10819G, 14212C, 8618C, 10086C, 16362C, 10398A, and 16124C. d) determining the haplogroup of said individual from 87. The method of claim 82 wherein said identifying step Said Sample; is performed using an array comprising two or more isolated nucleic acid molecules attached to a Substrate at a known e) comparing said haplogroup of Said individual to said location, each molecule having a length of about 7 to about Set of haplogroups native to Said geographic region; 30 nucleotides, each molecule comprising a sequence iden and tical with a portion of SEQ ID NO:1 containing at least one f) diagnosing said individual with a predisposition to said nucleotide allele at a locus Selected from the group of loci Selected physiological condition if Said haplogroup of consisting of those listed in column 1 of Table 3. 88. A method for identifying an evolutionarily significant Said individual is not within Said Set of haplogroups gene, Said method comprising: native to Said geographic region. 96. The method of claim 95 wherein said physiological a) providing a first set of nucleotide sequences comprising condition is Selected from the group consisting of energetic nucleic acid Sequences of at least one allelic gene imbalance, metabolic disease, abnormal energy metabolism, located in the mitochondrial genome or portion thereof abnormal temperature regulation, abnormal oxidative phos from a first population; phorylation, abnormal electron transport, obesity, amount of b) providing a Second set of nucleotide sequences com body fat, diabetes, hypertension, and cardiovascular disease. prising nucleic acid Sequences of the corresponding at 97. The method of claim 95 also comprising associating least one allelic gene located in the mitochondrial an amino acid allele with Said physiological condition, Said genome or portion thereof from a Second population; method comprising Selecting an amino acid allele useful for diagnosing Said haplogroup of Said individual, wherein the c) performing neutrality analysis, comprising comparing presence of Said amino acid allele is not useful for diagnos Said first Set to Said Second Set to generate a data Set, ing one or more haplogroups in Said Set of haplogroups and native to Said geographical region in which said individual d) analyzing said data set to identify an evolutionarily resides. Significant gene. 98. The method of claim 97 wherein said haplogroup is 89. The method of claim 88 wherein said first population Selected from the group consisting of: and/or Said Second population comprises at least one Sub population, Said Subpopulation being Selected from the a) haplogroup C and the amino acid allele is selected from group consisting of macro-haplogroup, haplogroup, Sub the group consisting of ntl 8584 T and ntl 14318 S; haplogroup, and individual. b) haplogroup D and the amino acid allele is selected from 90. The method of claim 88 wherein said second set of the group consisting of ntl 5178 M and ntl 8414F; nucleotide Sequences comprises at least 100 nucleotides identical to a portion of SEQ ID NO:2. c) haplogroup G and the amino acid allele is selected from 91. The method of claim 88 wherein said evolutionarily the group consisting of ntl:4833 A, ntl:8701 T, ntl 13708 Significant gene is a mitochondrial gene Selected from the T, and ntl 15452 I; US 2005/O123913 A1 Jun. 9, 2005 64

d) haplogroup L0 and the amino acid allele is selected a) providing a Sample from Said human; from the group consisting of ntl 5442 L, ntl 7146A, ntl b) identifying in said sample nucleotide allele 10663C; 94.02 P, int1 13105 V, and int1 13276 V; and e) haplogroup L1 and the amino acid allele is selected c) identifying in Said sample at least one nucleotide allele from the group consisting of ntl 7146 A, ntl 7389 H, ntl Selected 13105 V, int1 13789 H, and int1 14178 V; from the group consisting of 295T, 12612G, 13708A, and f) haplogroup T and the amino acid allele is selected from 16069T, wherein the presence of said nucleotide alleles the group consisting of ntl 4917D, ntl 8701 T, and ntl is diagnostic of a predisposition to LHON. 15452 I; 107. A method for diagnosing a predisposition to LHON in a human comprising: g) haplogroup W and the amino acid allele is selected from the group consisting of ntl 5046 I, ntl 5460 T, ntl a) providing a Sample from Said human; and 8701 T, and ntl 15884 P; and b) identifying in Said Sample a nucleotide allele Selected h) haplogroups V and H and the amino acid allele is from the group consisting of 3635A and 4640C, selected from the group consisting of ntl:8701 T and ntl wherein the presence of Said nucleotide alleles is diag 14766 T. nostic of a predisposition to LHON. 99. The method of claim 97 wherein said haplogroup is 108. A method for diagnosing increased likelihood of Selected from the group consisting of haplogroups A, I, X, developing blindness in a human comprising: B, F, Y, and U and the amino acid allele is ntl 8701 T. a) providing a Sample from Said human; 100. A program storage device in which the steps of claim 95 are encoded in machine-readable form, Said device also b) identifying in Said Sample a nucleotide allele Selected comprising a storage medium encoding Said information from the group consisting of 11778A, 14484C and identifying the geographic region in which Said individual 10663C; and resides and a set of haplogroups native to Said geographic c) identifying in Said Sample, nucleotide alleles encoding region in machine readable form. threonine at amino acid position 458 of gene ND5, 101. A Storage device comprising a data Set encoded in machine-readable form comprising nucleotide alleles wherein the presence of said nucleotide alleles is diag Selected from the group consisting of evolutionarily signifi nostic of a predisposition to develop blindness. cant human mitochondrial nucleotide alleles, each Said allele 109. A nucleic acid array comprising two or more spots, being associated in Said Storage device with encoded infor each spot comprising a plurality of Substantially identical mation identifying a physiological condition in humans. isolated nucleic acid molecules attached to a Substrate at a 102. The storage device of claim 101 wherein said physi defined location, each molecule having a length of about 7 ological condition is Selected from the group consisting of to about 30 nucleotides, and each molecule comprising a energetic imbalance, metabolic disease, abnormal energy sequence identical with a portion of SEQ ID NO:1 contain metabolism, abnormal temperature regulation, abnormal ing at least one nucleotide allele at a locus Selected from the oxidative phosphorylation, abnormal electron transport, group of loci consisting of those listed in column 1 of Table obesity, amount of body fat, diabetes, hypertension, and 3. cardiovascular disease. 110. The array of claim 109 wherein at least one molecule 103. The storage device of claim 101 also comprising has a Sequence comprising a nucleotide allele Selected from encoded information associating each Said nucleotide allele the group consisting of non-Cambridge human mtDNA With a native geographic region. nucleotide alleles of Table 3. 111. The array of claim 109 wherein at least one molecule 104. A program Storage device comprising the Storage has a Sequence comprising a nucleotide allele Selected from device of claim 101 and also comprising input means for the group consisting of non-Cambridge human mtDNA inputting a haplogroup of an individual and a geographic nucleotide alleles of Table 4. region of Said individual, Said device further comprising 112. The array of claim 109 wherein at least one molecule program Steps for diagnosing Said individual as having a has a Sequence comprising a nucleotide allele Selected from predisposition to a physiological condition. the group consisting of nucleotide alleles in nucleotide 105. A method for diagnosing a predisposition to LHON alleles useful for diagnosing human haplogroupS and macro in a human comprising: haplogroups (Table 11). 113. The array of claim 109 comprising more than about a) providing a sample from Said human; twenty-five spots. b) identifying in said sample nucleotide allele 10663C; 114. The array of claim 109 wherein said isolated nucleic and acid molecules are about 20 nucleotides in length. 115. A method for determining the presence or absence of c) identifying in Said Sample, nucleotide alleles encoding a nucleotide allele in a Sample comprising: threonine at amino acid position 458 of gene ND5; a) providing a prepared human Sample; wherein the presence of Said nucleotide alleles is diag nostic of a predisposition to LHON. b) providing an array of claim 109; 106. A method for diagnosing a predisposition to LHON c) contacting said array with and said Sample under in a human comprising: conditions allowing quantitative hybridization; US 2005/O123913 A1 Jun. 9, 2005 65

d) measuring the pattern hybridization of Said Sample to b) input means for inputting a data Set comprising one or Said array; and more nucleotide alleles, Said program Storage device e) analyzing said hybridization. also comprising program Steps for diagnosing a hap 116. A program Storage device comprising: logroup by associating Said input nucleotide alleles a) a machine readable storage device comprising a data T an associated haplogroup, and displaying the Set encoded in machine readable form, Said data Set reSult. comprising a plurality of nucleotide alleles and a hap logroup designation associated with each allele; and k . . . .