LIVING ON THE EDGE

POPULATION GENETICS OF FINNO-UGRIC-SPEAKING HUMANS IN NORTH EURASIA

Ville Nikolai Pimenoff

Department of Forensic Medicine University of Helsinki Helsinki, Finland

Departament de Ciències de la Salut i de la Vida Unitat de Biologia Evolutiva Universitat Pompeu Fabra Barcelona, Spain

Academic dissertation

To be publicly presented with the permission of the Medical Faculty of the University of Helsinki, in the lecture hall of the Department of Forensic Medicine on October 31st 2008 at 12 o’clock noon.

Helsinki 2008

NNikun06.inddikun06.indd 3 224.9.20084.9.2008 116:41:036:41:03 Supervisors Professor Antti Sajantila University of Helsinki Helsinki, Finland

Professor David Comas Universitat Pompeu Fabra Barcelona, Spain

Reviewers Professor Ulf Gyllensten University of Uppsala Uppsala, Sweden

Professor Pekka Pamilo University of Helsinki Helsinki, Finland

Opponent Doctor Lluís Quintana-Murci Institut Pasteur Paris, France

ISBN 978-952-92-4331-0 (paperback) ISBN 978-952-10-4913-2 (pdf) http://ethesis.helsinki.fi

Yliopistopaino Helsinki 2008

NNikun06.inddikun06.indd 4 224.9.20084.9.2008 116:41:076:41:07 “We’ll never deal with the devils in the details unless we see the big picture.”

Paul R. Ehrlich Human Natures—Genes, Cultures, and the Human Prospect

NNikun06.inddikun06.indd 5 224.9.20084.9.2008 116:41:076:41:07 CONTENTS

LIST OF ORIGINAL PUBLICATIONS ...... 7 ABBREVIATIONS ...... 8 SUMMARY ...... 9 1 REVIEW OF THE LITERATURE ...... 10 1.1 Introduction ...... 10 1.1.1 Genetic characteristics of the Finno-Ugric-speaking population ...... 10 1.1.2 Sampling in human genetic studies and the concept of a population ..... 13 1.2 Organization of the Human Genome ...... 13 1.2.1 General structure ...... 13 1.2.2 Variation in the Human Genome ...... 14 1.2.3 Population processes shaping genetic variation ...... 15 1.2.4 Visualizing genomic variation ...... 17 1.2.4.1 Molecular markers in human genetic studies ...... 17 1.2.4.3 Special characteristics of the uniparental markers ...... 18 1.2.5 Linkage disequilibrium ...... 20 1.2.5.1 Processes shaping LD ...... 20 1.2.5.2 Block structured genome, tagSNPs and LD mapping ...... 21 1.2.6 Human Genome Diversity and the HapMap project ...... 22 2 AIMS OF THE PRESENT STUDY ...... 24 3 MATERIALS AND METHODS ...... 25 3.1 Samples ...... 25 3.2 Molecular data ...... 25 3.3 Data analysis ...... 26 4 RESULTS AND DISCUSSION ...... 27 4.1 Uniparental genetic landscape in North Eurasia (I, II) ...... 27 4.2 Distribution of lactase persistence allele in North Eurasia (III) ...... 29 4.3 Patterns of LD in CYP2C and CYP2D gene subfamily regions in Europe (IV) ...... 33 5 CONCLUSIONS AND FUTURE PERSPECTIVES ...... 39 6 ACKNOWLEDGEMENTS ...... 40 7 REFERENCES ...... 42

NNikun06.inddikun06.indd 6 224.9.20084.9.2008 116:41:076:41:07 LIST OF ORIGINAL PUBLICATIONS

This thesis is based on the following original articles, which are referred to in the text by their Roman numerals. Study III has also been included in Enattah NS (2005) Molecular Genetics of Lactase Persistence, PhD thesis. University of Helsinki, Finland.

I. Hedman M, Pimenoff V, Lukka M, Sistonen P, Sajantila A (2004) Analysis of 16 Y STR loci in the Finnish population reveals a local reduction in the diversity of male lin- eages. Forensic Science International 142(1):37–43.

II. Pimenoff VN, Comas D, Palo JU, Vershubsky G, Kozlov A and Sajantila A (2008) Northwest Siberian and Mansi populations in the junction of West and East Eur- asian gene pools as revealed by uniparental markers. European Journal of Human Genet- ics advance online publication 28 May 2008 (DOI 10.1038/ejhg.2008.101).

III. Enattah NS, Trudeau A, Pimenoff V, Maiuri L, Auricchio S, Greco L, Rossi M, Len- tze M, Seo JK, Rahgozar S, Khalil I, Alifrangis M, Natah S, Groop L, Shaat N, Kozlov A, Verschubskaya G, Comas D, Bulayeva K, Mehdi SQ, Terwilliger JD, Sahi T, Savilahti E, Perola M, Sajantila A, Jarvela I, Peltonen L (2007) Evidence of still-ongoing conver-

gence evolution of the lactase persistence T-13910 alleles in humans. American Journal of Human Genetics 81(3):615–25.

IV. Pimenoff VN, Lavall G, Comas D, Palo JU, Gut I, Cann H, Excoffi er L and Sajantila A. Fine-scale recombination and linkage disequilibrium in the CYP2C and CYP2D cy- tochrome P450 gene subfamily regions in European populations and implications for as- sociation studies of complex pharmacogenetic traits (submitted).

Additional unpublished data and supplementary material have also been included in this thesis.

The original publications have been reproduced with the permission of the copyright holders.

7

NNikun06.inddikun06.indd 7 224.9.20084.9.2008 116:41:076:41:07 ABBREVIATIONS

CEPH Centre d’Étude du Polymorphisme Humain cM centimorgan CYP cytochrome P450 CYP2C19 cytochrome P450 2C19 gene CYP2C9 cytochrome P450 2C9 gene CYP2D6 cytochrome P450 2D6 gene DMEs drug-metabolizing enzymes HGDP Human Genome Diversity Project HGP Human Genome Project HVS Hypervariable segment of mtDNA indel insertion/deletion LD linkage disequilibrium LNP lactase non-persistence LP lactase persistence LPH lactase-phlorizin hydrolase MAF minor allele frequency mtDNA mitochondrial DNA NCBI National Center for Biotechnology Information

Ne effective population size NRY non-recombining part of the Y chromosome OMIM Online Mendelian Inheritance in Man SNP single nucleotide polymorphism STR short tandem repeat polymorphism tagSNP tagging SNP

8

NNikun06.inddikun06.indd 8 224.9.20084.9.2008 116:41:076:41:07 SUMMARY

In this thesis, I have explored the ori- tal genetic diversity suggested that some gins and distributions of genetic variation of the Finno-Ugric-speaking populations among the Finno-Ugric-speaking human in North Eurasia have resided in the con- populations living in remote areas of North tact zone of western and eastern Eurasian Eurasia; it aims to disentangle the underly- gene pools. This fact, along with the re- ing molecular and population genetic fac- duced uniparental and biparental genetic tors which have shaped the genetic diver- diversity found, emphasize the complex sity of these human populations. genetic background of these Finno-Ug- To determine the genetic variation with- ric-speaking populations shaped by recur- in and between these human populations I rent founder effects, admixture and genet- have used mitochondrial, Y-chromosom- ic drift. Moreover, the high frequency of

al and autosomal genetic markers. In mi- lactase persistence T-13910 allele among the tochondrial DNA analysis, we sequenced Finno-Ugric-speaking populations and the the HVS-I and HVS-II parts of the hyper- haplotype background shaped by recent variable control region along with phylo- positive selection suggests a local adap- genetically informative SNP from the cod- tive response to a lactose rich diet in North ing region of the mitochondrial genome. Eurasia. The Finno-Ugric-speaking Saami Multiple STR and SNP markers were also show a signifi cant difference in haplotype genotyped from the non-recombining part structure and LD within the cytochrome of the Y chromosome to assess the pater- P450 CYP2C and CYP2D gene subfam- nal variation among the particular North ily region mainly due to genetic drift, al- Eurasian populations. Moreover, multi- though the role of selection on these genes ple SNPs were genotyped across the LCT, responsible for xenobiotic metabolism can CYP2C and CYP2D gene regions for the not be excluded. autosomal genetic diversity analysis of bio- Based on our observations, the Finno- medical relevance. The obtained genotypes Ugric-speaking human populations show were further analyzed using various popu- unique genetic features due to the complex lation genetic methods. background of genetic diversity shaped by Our results revealed unique patterns molecular and population genetic process- of genetic diversity among the Finno- es and adaptation to remote areas of Boreal Ugric-speaking populations. Uniparen- and Arctic North Eurasia.

9

NNikun06.inddikun06.indd 9 224.9.20084.9.2008 116:41:076:41:07 1 REVIEW OF THE LITERATURE

1.1 INTRODUCTION populations (e.g. the , Saami, Khanty and Mansi) may hold genetic traces of ear- Since the discovery of numerous polymor- ly Upper Paleolithic people, who fi rst colo- phic markers in the human genome (Land- nized the North Eurasian regions c. 12,000 steiner 1931) and a well established popu- BP (Cavalli-Sforza et al. 1994, Derbeneva lation genetic theory pioneered by Haldane et al. 2002, Norio 2003b, Ross et al. 2006). (1924), Fisher (1930) and Wright (1931), The Finno-Ugric speakers represent popu- a great interest has focused on studies of lations, which can be clearly classifi ed into genetic variation in natural human popu- linguistic groups (Figure 1A) and which lations (Collins 2003, Jobling et al. 2003, inhabit relatively large and geographical- Kidd et al. 2004). Similarly, the recent ly remote areas in North Eurasia (Figure publication of the entire map of the hu- 1B). man genome along with the vast number It has been shown that the anatomically of available genetic markers and new com- modern human (i.e. Homo sapiens) colo- putational tools have enabled the analy- nized the ice-free Eurasia including some sis of whole genomes (Lander et al. 2001, areas north of the Arctic Circle already Venter et al. 2001, Collins et al. 2003, In- during the Upper Paleolithic (< 40,000 BP) ternational Human Genome Sequencing (Pavlov et al. 2001, Vasil’ev et al. 2002, Consortium 2004). The observed genom- Goebel et al. 1993). The Last Glacial Max- ic diversity within and among human indi- imum (23,000–14,000 BP) unquestionably viduals, groups and closely related species limited the northward spread of modern hu- has challenged us to understand the com- mans, until a rapid global warming around prehensive heritable variation in humans 12,000 BP initiated the melting of the con- (Carroll 2003, Collins 2003, Kidd et al. tinental ice sheet revealing novel areas for 2004). How much does the variation have colonization (Hewitt 1999). Indeed, the any functional signifi cance? How rare or permanent arrival of modern humans into common is a particular fraction of the vari- North Eurasia (Nordqvist 2000, Vasil’ev ation? What is the distribution of the varia- et al. 2002, Bergman et al. 2004) is dated tion within humans and what molecular or to the early Boreal period (< 12,000 BP), population level factors caused the distri- although the geographical origin of these bution of variation that we see today? In settlements is not entirely clear (Nordqvist this thesis, I have examined the origin and 2000, Dolukhanov et al. 2002, Kuzmin and distribution of genetic variation among the Keates 2004). North Eurasian Finno-Ugric-speaking pop- The Finnish population is one of the ge- ulations by using mitochondrial, Y-chromo- netically well-studied Finno-Ugric-speak- somal and autosomal molecular markers. ing human population (Nevanlinna 1972, Kere 2001, Norio 2003a; 2003b; 2003c). A particular reason for this has been the 1.1.1 Genetic characteristics of the Finnish Disease Heritage, a highly specifi c Finno-Ugric-speaking populations spectrum of more than 30 inherited, mostly It has been suggested that some of the recessive diseases with high prevalence in North Eurasian Finno-Ugric-speaking the Finnish population but rare or absent in

10

NNikun06.inddikun06.indd 1010 224.9.20084.9.2008 116:41:076:41:07 AB

Figure 1. A) Human populations speaking Finno-Ugric languages belong to a specifi c branch of the Uralic language family which is distinct from the Samoyed-speaking branch also within the Uralic group (Greenberg 2000). The Finno-Ugric language group is further divided into four subclusters within the Finnic group, i) Baltic languages of Finnish, Estonian and Karelian, ii) Saami languages, iii) Volgaic languages of Erza, and Mari, iv) Permic languages of Komi and Udmurt, while the Ugric group consists of the Khanty, Mansi and Hungarian languages (Abondolo 1998). B) A map showing the geographic locations of the North Eurasian Finno-Ugric-speaking populations used in this study (I–IV).

other populations (Perheentupa 1995, No- Z, U4 and U7; Sajantila et al. 1995, Meini- rio 2003c). Finns are also considered an lä et al. 2001, Hedman et al. 2007). The Y- ethnically more homogenous than several chromosome variation has revealed local other European populations (Nevanlinna reduction in the genetic diversity (Sajan- 1972, Kere 2001). However, already based tila et al. 1996) and signifi cant genetic dif- on classical protein markers the Finnish ferences between Western and Eastern Fin- population was shown to share genetic land (Kittles et al. 1998; 1999, Lahermo et roots not only with the West European pop- al. 1999, Zerjal et al. 2001, Lappalainen et ulations but also with more eastern popula- al. 2006, Palo et al. 2007). A clear eastern tions (Nevanlinna 1972; 1984, Guglielmino component (haplogroup N3; Rootsi et al. et al. 1990). These observations attracted 2007) in the Finnish Y-chromosome gene studies of mitochondrial (mtDNA) and Y- pool (> 50%) has been observed (Zerjal chromosomal markers to characterize the et al. 1997, Lappalainen et al. 2006). Re- maternal and paternal Finnish gene pool, cent accumulation of the autosomal genetic respectively (Vilkki et al. 1988, Pult et al. data has clarifi ed Finns as part of the west- 1994, Sajantila et al. 1994; 1995; 1996, La- ern cluster of the Eurasian genetic land- hermo et al. 1996; 1999, Zerjal et al. 1997; scape, although Finns are outliers among 2001, Kittles et al. 1998; 1999, Finnilä et the general European populations (Caval- al. 2001, Meinilä et al. 2001, Raitio et al. li-Sforza et al. 1994, Kidd et al. 2004, Lao 2001, Hedman et al. 2007, Lappalainen et et al. 2008). al. 2006, Palo et al. 2007). Mitochondrial The Saami is another relatively well- studies have shown a clear western origin studied European Finno-Ugric speaking and diversity of the Finnish gene pool, but populating the northernmost also minor traces (< 5%) of eastern gene parts of Norway, Sweden, Finland and Kola fl ow have been observed (eg. haplogroup Peninsula of (Ross et al. 2006). Sev-

REVIEW OF THE LITERATURE 11

NNikun06.inddikun06.indd 1111 224.9.20084.9.2008 116:41:076:41:07 eral studies have shown the Western Euro- populate mainly the Ob-river valley region pean genetic affi nity of the Saami people, in East Eurasia (Kolga et al. 2001, Der- although their origin is still controversial beneva et al. 2002, Karafet et al. 2002). (Cavalli-Sforza et al. 1994, Sajantila et al. The Mansi mtDNA variation has shown 1995, Tambets et al. 2004, Ross et al. 2006, a high frequency of Western Eurasian lin- Ingman and Gyllensten. 2007, Johansson eages, which has been interpreted as a ge- et al. 2008). However, it has been demon- netic continuum of the early Upper Paleo- strated that Saami are genetically extreme lithic populations expanding from Near outliers within Europe (Cavalli-Sforza et a. East/Southeast Europe to North Eurasia 1994), with strikingly low mtDNA diversity (Derbeneva et al. 2002). However, a clear (Sajantila et al. 1995; Lahermo et al. 1996, East Eurasian derived mtDNA component Tambets et al. 2004). The Saami mtDNA (~38%) is present in the Mansi (Derbeneva lineages are mainly of Western Eurasian et al. 2002). Therefore, it is proposed that origin while two East Eurasian lineages the present day Mansi may also contain (i.e. haplogroup D5 and Z) indicate minor genetic traces of the early Upper Paleo- (≤ 6%) Asian contribution as well (Tor- lithic people originating from Central Asia/ roni et al. 1998, Meinilä et al. 2001, Tam- South Siberia (Wells et al. 2001, Derbene- bets et al. 2004, Ingman and Gyllensten va et al. 2002, Karafet et al. 2002, Deren- 2007). Similarly, the Y-chromosome varia- ko et al. 2007). Interestingly, the Y-chromo- tion separates the Saami from other North some variation has shown some specifi city Eurasian populations, and shows low ge- (i.e. N2 lineage) among the Siberian popu- netic diversity (Sajantila et al. 1996, Laher- lations speaking , includ- mo et al. 1999, Tambets et al. 2004). Two ing the Khanty (Karafet et al. 2002). How- West Eurasian lineages (i.e. haplogroup I ever, these Uralic-speaking populations as and R1a) together and one East Eurasian a whole are not characterized by a discrete N3 lineage account for ~40% of the Saa- set of founder Y-chromosome lineages mi Y chromosomes, respectively (Wells et (Karafet et al. 2002). The uniparental ge- al. 2001, Tambets et al. 2004). Uniparen- netic composition of the Khanty and Mansi tal and autosomal marker studies suggest has been interpreted to represent a recur- that the origin of the Saami gene pool is rent amalgamation of small Eurasian popu- an admixture of Western and Eastern Eur- lation groups along with the spread of hu- asian genetic components (Cavalli-Sforza mans into North Eurasia (Derbeneva et al. et al. 1994, Tambets et al. 2004, Ross et 2002, Karafet et al. 2002). al. 2006, Ingman and Gyllensten 2007, Jo- The genetic diversity of the other North hansson et al. 2008). Based on the unique Eurasian Finno-Ugric-speaking popula- genetic diversity of Saami it is interpreted tions (i.e. Estonian, Karelian, Erza, Moksha, that the Saami population has remained in Mari, Komi and Udmurt) has been studied small and constant size since its origin (Sa- less comprehensively. The main interpre- jantila et al. 1995, Laan and Pääbo 1997, tation among these Finno-Ugric-speaking Ross et al. 2006). populations is that they have closer genet- The Ugric-speaking Khanty and Mansi ic affi nities with each other than with the ethnic groups originate from a common non-Finno-Ugric speakers (except the Lat- Ob-Ugric population on the western side vians and ) living in North Eur- of the Ural mountains (Kolga et al. 2001, asia (Zerjal et al. 1997; 2001, Khusnutdi- Derbeneva et al. 2002). Currently, they nova et al. 1999, Rosser et al. 2000, Raitio

12 REVIEW OF THE LITERATURE

NNikun06.inddikun06.indd 1212 224.9.20084.9.2008 116:41:086:41:08 et al. 2001, Derbeneva et al. 2002, Karafet in 1972, Barbujani et al. 1997, Jorde et al. et al. 2002, Laitinen et al. 2002, Kutuev et 2000, Romualdi et al. 2002, Rosenberg et al. 2006, Tambets et al. 2004, Rootsi et al. al. 2002). These results clearly show that 2007). human races do not have any biological ba- sis and should rather be considered with- in a complex historical and social context 1.1.2 Sampling in human genetic studies (Lewontin 1972, Collins et al. 2003). and the concept of a population Due to the sensitive issues concerning The sampling of individuals and the cri- human population genetic studies, ethi- teria for defi ning a population are funda- cal guidelines have been enforced since mental issues in genetic studies of natural the reports of World Health Organization populations (Waples and Gaggiotti 2006). (1964), World Medical Association (1964) In biological terms a population is defi ned and North American Regional Commit- as a group of organisms of the same spe- tee of the Human Genome Diversity Proj- cies that interbreed and occupy a particu- ect (1997) (reviewed by Greely 2001b). In lar space at a particular time (Krebs 1994). general, these guidelines aim to inform in- In practice, however, population boundar- dividuals and groups of people who want to ies are often notoriously diffi cult to defi ne, participate actively in such population ge- and humans make no exception. It is thus netic studies, having access to the genetic apparent that there is no single consensus data created, and also having a possibility to defi ne a population but instead it de- to infl uence the concept of the study. Cur- pends on the context and objectives of the rently, the most important ethical require- study (reviewed by Waples and Gaggiotti ment for human population genetic studies 2006). In practical terms, one defi nition of is an informed consent from each individ- a population in humans is often a group of ual sampled and, if possible, a group con- individuals that can be clustered according sent obtained from the appropriate cultural to some shared social or physical charac- authorities of the particular population or teristic; e.g. geography, ethnicity, linguis- ethnic group (Greely 2001b). In addition, tic affi liation, culture, subsistence pattern guidelines for giving scientifi c feedback and self-identity are often taken as proxies and for explaining the obtained results to to defi ne a population (Jobling et al. 2003). the volunteers of the particular study have Consequently, several important ethical been proposed (Greely 2001b). issues associated with genetic variation, ethnicity and race have been brought up (Greely 2001b, Tishkoff and Kidd 2004). 1.2 ORGANIZATION OF THE Especially the indigenous peoples have HUMAN GENOME been concerned about the consequences as genetic studies of human populations are of potential social and legal impact (Gree- 1.2.1 General structure ly 2001ab, Collins et al. 2003, Jobling et A single haploid human genome is estimat- al. 2003). It is important to note that in hu- ed to contain about 3.2 billion nucleotides mans racial defi nitions are not based on bi- with an average of 22,000 genes (Lander ology. Up to 95% of the genetic variation et al. 2001, International Human Sequenc- in humans resides within populations and ing Consortium 2004, Jobling et al. 2003). only 5–13% between populations (Lewont- Less than 2% of the human genome is as-

REVIEW OF THE LITERATURE 13

NNikun06.inddikun06.indd 1313 224.9.20084.9.2008 116:41:086:41:08 sumed to encode proteins (Lander et al. and Cavalleri 2005, International HapMap 2001, International Human Sequencing Consortium 2005; 2007). However, even Consortium 2004). This means that the 0.1% difference between two human ge- number of genes in humans is greatly less nomes denotes over 3 million differing bas- than earlier estimates of 100,000 and even es, and around 12 million possible variants less than identifi ed in the nematode worm (Kruglyak and Nickerson 2001, Kidd et al. Caenorhabditis elegans (23,000), the fruit 2004). This variation is enough to ensure a fl y Drosophila melanogaster (26,000) and unique genome for every human individu- the rice Oryza sativa (45,000). Moreover, al (Kidd et al. 2004). the non-coding repeat elements showed to comprise more than 50% of the human genome, outnumbering that seen in the 1.2.2 Variation in the Human Genome Caenorhabditis elegans (7%) and in the Genome variation occurs in various forms, Drosophila melanogaster (3%) (Lander et such as single-base substitutions (single al. 2001). Human and chimpanzee genom- nucleotide polymorpisms, SNPs), inser- es only diverge by 1.2%, and human and tions or deletions of tandem (short tandem Neanderthal genomes are estimated to di- repeats, STRs; variable number of tandem verge only by 0.5% (Chimpanzee sequenc- repeats, VNTRs) or dispersed elements ing and analysis consortium 2005, Green (short interspersed elements, SINEs; long et al. 2006). Moreover, compared to the interspersed elements, LINEs), rearrange- great apes, humans as a species show low ments (copy number variants, CNVs) and genetic diversity and low genetic structure, other more complex variables (Figure 2; which indicates a demographic bottleneck Denoeud et al. 2003, Watkins et al. 2003, in the early history of humans (Kaessmann Jobling et al. 2003, Kidd et al. 2004, Re- and Pääbo 2002). Consequently, human don et al. 2006). genomes are between 99.5–99.9% similar These genomic variations can also be to each other (Kidd et al. 2004, Goldstein categorized into short (< 10bp), interme-

– –

,

Figure 2. Classes of typical human genome variants.

14 REVIEW OF THE LITERATURE

NNikun06.inddikun06.indd 1414 224.9.20084.9.2008 116:41:086:41:08 diate (10bp–10kb) or long size (1kb–1Mb) 1–2kb in which the recombination rate is genomic variants (Figure 2; Jobling et al. ten or more times higher than in surround- 2003). For practical purposes, a variant in ing regions, although regions up to sever- a DNA sequence is defi ned as a polymor- al Mb have also been observed with high phism when at least two alleles are present recombination rates (Arnheim et al. 2003). in a population, and the frequency of the The overall existence of the hotspots has minor allele is MAF ≥ 0.01. The polymor- been confi rmed by larger analysis estimat- phisms may be within coding, non-coding ing between 30,000 and 50,000 hotspots or regulatory regions. Within a coding re- along the entire genome (Myers et al. gion a polymorphism may alter the amino- 2006). However, very little is known about acid composition or terminate the transla- the underlying biological mechanisms cre- tion of the corresponding protein, while ating hotspots (reviewed by Arnheim et al. variation within a regulatory region may 2003). Based on allele-specifi c hotspots, it change the level of expression of the partic- is suggested that the main factor causing ular protein. As the majority of the human hotspots is the distribution of recombina- genome is non-coding, most of the vari- tion initiation sites along the genome (Jef- ants do not affect the amino acids. How- freys and Neumann 2002, Arnheim et al. ever, there is a fraction of polymorphisms 2003). However, most hotspots have been that do affect the amino-acid composition shown to lack these motifs, indicating mul- (0.7% of SNPs, dbSNP128) or gene ex- tiple causes for the recombination hotspots pression (1.5% of SNPs, dbSNP128). The in the human genome (Myers et al. 2005). variants created by mutation can also be re- arranged by recombination, which further increases diversity. Both non-homologous 1.2.3 Population processes shaping recombination between chromosomes and genetic variation homologous recombination within a chro- Genetic diversity created by molecular mosome are fundamental genomic process- mechanisms (i.e. mutation and recombina- es exchanging genetic material and increas- tion) is further modifi ed by population-lev- ing the genetic variation initially caused by el processes of genetic drift, migration and mutations (Nachman 2001). The rate of re- natural selection (Jobling et al. 2003, Kidd combination is typically estimated in centi- et al. 2004). These evolutionary forces morgans (cM), which describe the genetic may affect the variation differently within distance in units of recombination frequen- and among the populations or genomic re- cy (1cM = 1% recombination). The whole gions. Hence, although humans are geneti- genome average recombination rate is cally ~99.9% identical, evolutionary mech- 1cM/Mb. However, the recombination rate anisms have created a substantial amount is shown to vary extensively along the ge- of genetic variation, observed 95% with- nome from 0.1cM/Mb to more than 3cM/ in and 5–13% among human populations Mb (Kong et al. 2002). More important- (Kidd et al. 2004, McVean et al. 2005). ly, Jeffreys et al. (2001) estimated that re- In a given population, each generation combination rates are organized into sharp represents a fi nite sample from the previ- local peaks and valleys (i.e. hotspots and ous generation. This random sampling of coldspots, respectively) along the genome. gametes known as genetic drift changes These recombination hotspots are defi ned allele and haplotype frequencies between as small fractions of genome between generations until the variant becomes ei-

REVIEW OF THE LITERATURE 15

NNikun06.inddikun06.indd 1515 224.9.20084.9.2008 116:41:086:41:08 ther fi xed or lost (Wright 1931). Under neu- phenotypes in succeeding generations. In tral evolution, the number of new alleles other words, individuals with allele com- generated by mutation is mainly shaped by binations better adapted to the prevailing random genetic drift (Kimura 1968, Ohta conditions are more likely to have higher 2002). Drift is modeled through the ideal chances to survive and reproduce. Alleles population model and measured as the ef- that reduce the survival are subject to neg-

fective population size (Ne). In this context, ative (i.e. purifying) selection and reduce Ne is the size of an idealized Wright-Fish- in frequency, while variants that increase er population, i.e. population with infi nite survival undergo positive selection and in- size, equal sex ratio, non-overlapping gen- crease in frequency. Moreover, other se- erations and random mating that experienc- lective forces, such as balancing selection, es the same amount of genetic drift as the may prefer heterozygote loci or maintain one under study (Wright 1931). The small- alleles at low frequency, creating high ge-

er the Ne the greater the genetic drift and netic diversity, as observed at the HLA loci vice versa. Therefore, reductions in popu- responsible for immune response (Beck lation size (i.e. bottleneck/ founder effect) and Trowsdale 2000, Jobling et al. 2003). increasing genetic drift can dramatically Recently, several studies have used molec- change the allele frequencies in popula- ular data to estimate the departures of al- tions. An example of genetic drift in hu- lele frequency distributions from neutral mans is the reduced genetic diversity of expectations and thus to detect natural se- modern human populations whose ances- lection in humans (Akey et al. 2002, Busta- tors migrated out of Africa and experienced mante et al. 2005, Sabeti et al. 2006; 2007, a bottleneck, and lower effective popula- Wang et al. 2006, Williamson et al. 2007). tion sizes compared to most sub-Saharan Based on current observations, it is evident populations (Cann et al. 1987, Armour et that selection has a strong role in shaping al. 1996, Reich et al. 2001). human genetic variation, although the rel- Migration and subsequent gene fl ow ative contribution of the positive, negative homogenizes allele frequencies between and balancing selection to human genetic populations, which reduces the effect of variation is still unclear (Kelley et al. 2006, random genetic drift (Jobling et al. 2003). Kryukov et al. 2007, Nielsen et al. 2007). The extreme scenario of a gene fl ow is an It is estimated that the majority of the nat- admixture of two populations into an ad- ural selection acting on genomes is of neg- mixed population. The extent of admixture ative selection removing new deleterious is often inferred using a predefi ned set of mutations (Nielsen et al. 2007). But most parental populations from which the ad- of the genetic surveys have focused on de- mixed population is assumed to have de- tecting positive selection to disentangle the rived (Bertorelle and Excoffi er 1998). In- molecular-level traces of evolutionary ad- deed, estimates of the global admixture of aptations and subsequent factors relating human populations inferred purely from humans to their environment (Nielsen et the genetic structure have also been report- al. 2007). ed (Pritchard et al. 2000, Rosenberg et al. Even a weak selective benefi t might 2002). cause substantial changes in allele frequen- Natural selection, originally set by Dar- cies over generations. However, in natural win (1859) and later refi ned by Fisher populations different evolutionary forc- (1930), defi nes the differential survival of es overlap with one another. Consequent-

16 REVIEW OF THE LITERATURE

NNikun06.inddikun06.indd 1616 224.9.20084.9.2008 116:41:086:41:08 ly, genetic drift affects the same alleles which more than 6.6 million are validat- subjected to natural selection, and thus it ed and 7.5 million are found within genes may be diffi cult to identify loci subject to (Build 129, April 2008). Therefore, SNPs natural selection (Kelley et al. 2006). It are well-suited for several genetic and evo-

is known that in populations of small Ne lutionary studies including candidate gene stronger selection is needed to infl uence or causal variant mapping, for assessing allele frequencies, whereas allele frequen- genetic diversity and divergence, and re- cies of larger populations might be shaped cently for disentangling whole genome as- by weaker selective forces (Nielsen et al. sociations of complex traits and risk factors 2007). such as cancer, diabetes and vascular dis- eases (Collins et al. 2004, Kidd et al. 2004, McVean et al. 2005). 1.2.4 Visualizing genomic variation STRs (i.e. microsatellites) are tandem repeat markers of 1–6 nucleotides (e.g. … 1.2.4.1 Molecular markers CACACA… dinucleotide repeats) which used in human genetic studies are the most variable types of DNA se- DNA sequencing is the ultimate tool for quences in humans (Weber 1990). The STR detecting all different genetic variants in loci have typically a high number of dif- any particular genomic region (Sanger et ferent alleles per locus, each with different al. 1977), but the method is often techni- number of repeat motifs (reviewed by El- cally limited and more expensive to con- legren 2004). Moreover, STRs are found in duct than genotyping individual SNP or all chromosomes, and with a high repeat STR loci (Mir and Southern 2000, Syvän- number variability between individuals. en 2001). Moreover, the vast number of Most of the known STRs are most likely identifi ed SNP and STR loci in humans neutral, although some are involved in hu- and new high-throughput methods enable man diseases. In the disease causing mic- effi cient and simultaneous genotyping of rosatellites, the causal factor is often the these markers (Mir and Southern 2000, increase of the repeat number over some Syvänen 2001, Collins et al. 2003). threshold level (e.g. > 30 and up to 2000 It is estimated that SNP markers ac- repeats in myotonic dystrophy; Mahade- count for most of all human genetic vari- van et al. 1992). Estimations using hu- ation, and are also likely to play a crucial man pedigree and sperm sample analysis role in how humans respond to exogenous have shown that the STR mutation rate pathogens, chemicals, drugs and other ther- is between 10-3–10-4 per locus per gener- apies (Collins et al. 2004, Kidd et al. 2004, ation (Heyer et al. 1997, Brinkmann et al. International Human Genome Sequencing 1998, Sajantila et al. 1999, Kayser et al. Consortium 2004, International HapMap 2000, Xu et al. 2000). The STR mutation Consortium 2007, McVean et al. 2005). is mainly modeled by the stepwise muta- SNPs are mostly biallelic with an estimat- tion model (SMM), which postulates that ed average mutation rate of 2.3 × 10-8 per the mutations, i.e. gain or loss of one re- site per generation (Nachman and Crowell peat, occur at fi xed rate independent of re- 2000). On average, there is one SNP with- peat length (Ohta and Kimura 1973). How- in every 200bp along the human genome, ever, this is not totally true as the rate and as currently there are more than 18 million direction of the STR mutations are shown SNPs listed in the human genome, from to be length-dependent (reviewed by El-

REVIEW OF THE LITERATURE 17

NNikun06.inddikun06.indd 1717 224.9.20084.9.2008 116:41:086:41:08 legren 2004). Microsatellites, due to their informativeness, Mendelian inheritance and relative ease of genotyping, have prov- en extremely powerful for linkage analysis of Mendelian disorders (Jorde et al. 1997, Zhivotovsky et al. 2003) and for studies of evolution and population genetic struc- ture (Rosenberg et al. 2002, Ellegren et al. 2004) as well as for genetic identifi cation and paternity testing relevant in forensic medicine (Jobling et al. 1997). Due to the high number of SNP and STR markers identifi ed in the human genome, some of the markers in the same chromo- some are close to each other. Often alleles of markers in close physical proximity are passed on from parents to offspring togeth- Figure 3. Haplotype phase estimation between er. These allele combinations, called haplo- two diploid loci of a) homozygotes, b) homozy- gote and heterozygote, and c) heterozygotes. types, either directly from haploid mtDNA Arrows denote the two diploid loci and H1/H2 and Y-chromosome loci or from diploid denote the deduced haplotypes. chromosomes can be considered as single alleles from a gamete. Haplotypes, com- pared to single markers, provide a greater lotype estimations. Molecular methods in- statistical power for analysis and reduce clude amplifi cation of a single cell genome the sample size needed for analysis of sig- (Ruano et al. 1990), allele specifi c ampli- nifi cant association (Clark 2004). However, fi cation (Michalatos-Beloin et al. 1996) or the deduction of haplotypes from diploid construction of mouse-human hybrid cells marker data is not always straightforward. with haploid human genome (Patil et al. To illustrate the problem we may assume 2001). Statistical approaches are based on three different cases (Figure 3), where two the assumption of a common ancestor ho- SNPs in a same chromosome a) are ho- mozygous at all sites. The steps from the mozygous, b) only one SNP is heterozy- observed variation to this common ances- gous and c) both SNPs are heterozygous. tor within a population are then estimated Homozygous SNPs will produce two iden- (Clark 1990). These statistical methods tical haplotypes, whereas two different hap- have shown to be powerful for accurate lotypes are observed when only one site is estimation of the haplotypes from diploid heterozygous. However, in the case of two genotypes (Excoffi er and Slatkin 1995, heterozygous SNPs, the allele combina- Stephens et al. 2001). tions are often shuffl ed by recombination making it more diffi cult to determine the true haplotypes. To overcome these limi- 1.2.4.2 Special characteristics tations, genealogical, molecular and sta- of the uniparental markers tistical methods have been used. In princi- In human cells, there are hundreds to thou- ple, using pedigree data from parents and sands copies of cytoplasmic organelles grandparents often enables accurate hap- called mitochondria. Each mitochondria

18 REVIEW OF THE LITERATURE

NNikun06.inddikun06.indd 1818 224.9.20084.9.2008 116:41:086:41:08 contain at least a single copy of mitochon- Only two pseudoautosomal telomeric seg- drial DNA (mtDNA), organized in a small ments recombine with the X chromosome, (~16.6kb) circular double-stranded DNA but these amount to less than 5% of the to- molecule, which is transmitted without re- tal length of the chromosome (Jobling and combination only through the mother. The Tyler-Smith 2003). The NRY part of the Y- mtDNA genome contains 37 genes and a chromosome is extremely gene poor, cod- non-coding control region including three ing for only 27 proteins but enriched with known hypervariable segments (HVS- many types of DNA repeats and variants. I, HVS-II and HVS-III) (Anderson et al. To date, more than 200 binary polymor- 1981, Andrews et al. 1999, Bandelt et al. phisms (i.e. SNPs), over 200 microsatellites 2006). The mtDNA has a much greater av- and several other repeat polymorphisms erage mutation rate (3.4 × 10-7 – 3.6 × 10–6) within the NRY have been characterized than nuclear genome (2.5 × 10-8) (Ingman (Jobling and Tyler-Smith 2003, Kayser et et al. 2000, Nachman and Crowell 2000, al. 2004). These two marker classes have Richards et al. 2000). As a haploid non- differing mutation rates; the slow evolv- recombining molecule the variation with- ing SNP markers allow construction of in mtDNA results only from the accumula- common Y-chromosome clusters (i.e. hap- tion of mutations. The mtDNA haplotypes logroups) and their phylogeny, whereas are often further clustered into mtDNA lin- STRs within these haplogroups enable a eages or haplogroups (Torroni et al. 2006), more detailed haplotype resolution (Knijff which possess a molecular record of the 2000, YCC 2002). The differing resolution maternal genealogical history. In humans, obtained with these markers has allowed a analysis of mitochondrial genomic diversi- detailed phylogeographic analysis of the ty and phylogeography, an analysis of geo- human male populations (Underhill et al. graphical distribution of the variation, were 2000, Wells et al. 2001, also reviewed by initially studied merely with the HVS-I Jobling and Tyler-Smith 2003). (360bp) and HVS-II (268bp) regions and Both mtDNA and Y chromosome loci more recently using complete mtDNA ge- possess only one quarter of an effective nomes. The mtDNA has proven powerful population size compared to the nuclear for assessing both the micro-geographic fe- DNA. Therefore the genetic diversity and male population histories and reconstruct- phylogenetic structure of uniparental loci ing broader prehistoric human dispersal are more sensitive to changes in the de- (Cann et al. 1987, Ingman et al. 2000, Tor- mography (e.g. bottlenecks) and generally roni et al. 2006). In addition, the high copy show greater genetic differences between number of this small circular genome has different groups or populations. All these enabled several successful ancient DNA evolutionary characteristics combined analyses (Pääbo 1989, Cooper and Poinar with the well-defi ned phylogeny and uni- 2000, Pimenoff and Korpisaari 2004). fi ed nomenclature system (YCC 2002, Tor- Y chromosome, the sex-determining roni et al. 2006), make uniparental markers male specifi c locus of the human genome ideal tools for investigating the recent hu- and a uniparental haploid counterpart of man evolution, with additional important the mtDNA, consists mainly of non-recom- applications in medical and forensic genet- bining DNA (NRY, 57Mb–60Mb), which ics (Jobling et al. 1997, Howell et al. 2003, is transmitted only from father to male Jobling and Tyler-Smith 2003). offspring (Jobling and Tyler-Smith 2003).

REVIEW OF THE LITERATURE 19

NNikun06.inddikun06.indd 1919 224.9.20084.9.2008 116:41:096:41:09 1.2.5 Linkage Disequilibrium li 2003, Wall and Pritchard 2003, Slatkin 2008). Mutations generally create LD, but 1.2.5.1 Processes shaping LD locus with a high mutation rate or a high Linkage disequilibrium (LD) defi ned as a number of alleles (e.g. STRs) tend to erode non-random association of alleles at linked LD (Ardlie et al. 2002), although recom- loci may be broken by recombination dur- bination and gene conversion are the main ing meiosis. Alleles at loci lying in close molecular factors reducing LD. The popu- proximity in a chromatid recombine less lation genetic factors (drift, migration and frequently than those far apart and are selection) have more diverse effects on LD. more likely to be in LD. Thus, a new mu- Consequently, the increased drift of small tation arising in the genome is initially in populations tends to increase LD as haplo- complete LD with the adjacent marker al- types are lost from the population (Terwil- leles, which is indicated by only three of liger et al. 1998). Thus, small isolate popu- the four possible haplotypes between the lations have been shown to possess higher two loci within a population (Figure 4, re- extended LD compared to populations of a viewed by Ardlie et al. 2002). larger size (reviewed in Ardlie et al. 2002). The most commonly used measures of It is noteworthy that inbreeding as a phe- pairwise linkage disequilibrium are |D’| nomenon inseparable from drift can pro- (Lewontin 1964) and r2 (Hill and Weir duce similar results of reduced heterozy- 1994), which both vary between complete gosity as random genetic drift in small (1.0) and no (0.0) association between loci. populations (Slatkin et al. 2008). There- Moreover, a recently developed Bayesian fore, the combined effect of drift and in- method to estimate the population recom- breeding in some human population have

bination parameter ␳ = 4Ne r from genotypes been proposed to have caused the extend- has proven to be an effi cient way to quanti- ed tracts of genomic homozygosity (Gib- fy LD differences between populations (Li son et al. 2006). Interestingly, gene fl ow and Stephens 2003, Crawford et al. 2004, between populations also enhances the LD. Evans and Cardon 2005). However, there Immediately after two populations have is no consensus on what is the best statis- admixed, LD is proportional to the allele tic for LD as LD measures are known for frequency differences between the parental several stochastic limitations e.g. differ- populations and not related to the distances ing sensibility to sampling and population between markers. In the following genera- genetic processes, although the ␳ estimate seems to be more robust to fl uctuations of ascertainment bias and marker density than the pairwise LD estimates (Lewontin 1988, Weiss and Clark 2002, Phillips et a. 2003, Evans and Cardon 2005). The strength of LD between a pair of markers depends on both molecular and population genetic factors, which have shown to generate varying patterns of LD across the human genome and populations (Ardlie et al. 2002, Wang et al. 2002, Ber- Figure 4. A new mutation G* associated with A tranpetit et al. 2003, Tishkoff and Verel- allele at nearby locus.

20 REVIEW OF THE LITERATURE

NNikun06.inddikun06.indd 2020 224.9.20084.9.2008 116:41:096:41:09 tions, this artifi cial LD between unlinked would be successful for mapping most of markers fades, while LD between nearby the common genomic variation (Carlson markers is often slowly dissipated by re- et al. 2004). The structure and distribu- combination. Strong positive selection in- tion of LD blocks along the genome has creases the frequency of an advantageous been shown to be shared by diverse human allele but also alleles closely linked to it, populations and would indicate a com- creating unusually strong LD between the mon feature in the human genome (Daly causal and neutral alleles. This phenome- et al. 2001, Gabriel et al. 2002). But the non is called genetic hitch-hiking (Smith quest for common haplotypes of the hu- and Haigh 1974), in which an entire seg- man genome has shown to be more diffi - ment of DNA (i.e. haplotype) fl anking an cult a task than expected with no clear cur- advantageous variant can rapidly rise to rent consensus (Daly et al. 20001, Gabriel high frequency or even fi xation. The se- et al. 2002, Zhang et al. 2002, Phillips et al. lective sweeps have shown signifi cantly 2003, Stumpf and Goldstein 2003, Ding et elevated LD and reduce heterozygosity al. 2005, Zeggini et al. 2005, Internation- among the closely linked neutral markers al HapMap Consortium 2007). However, within particular regions (Smith and Haigh some common features can be deduced in 1974, Kim and Stephan 2002, Nielsen et agreement with most LD studies. The sub- al. 2005). Similarly, although the overall Saharan African populations tend to have effect is generally not so strong, negative shorter LD blocks compared to non-Afri- selection against a deleterious variant may can populations. This is explained by the increase LD as the deleterious haplotypes interplay of more recent recombination are deleted from the population. and a bottleneck leading to genetic drift experienced by modern humans since the expansion out of Africa as opposed to the 1.2.5.2 Block structured genome, present-day sub-Saharan Africans (Tish- tagSNPs and LD mapping koff et al. 1996, Jorde 2000, Gabriel et al. Based on simulation studies it was as- 2002, Wall and Pritchard 2003, Conrad et sumed that genomic LD rarely extends al. 2006). Moreover, in a whole genome over 3kb (Kruglyak 1999). However, re- analysis Hinds et al. (2005) estimated that cent studies have shown that a great frac- non-African and African-American pop- tion of LD in the human genome is or- ulations have around 95,000 and 236,000 ganized into discrete sets of loci of low LD blocks with an average block size of haplotype diversity and high LD between 23.0kb and 8.8kb, respectively. Therefore, markers (i.e. haplotype or LD blocks) sep- it was proposed that further studies with arated by short regions (1–2kb) of intense larger sets of human populations are need- hotspots of recombination (Jeffreys et al. ed to establish more reliable defi nitions of 2000, Jeffreys et al. 2001, Daly et al. 2001, the block boundaries along the human ge- Gabriel et al. 2002, Goldstein 2001, Patil nome. et al. 2001, May et al. 2002). This led to Regardless of the block criteria, certain the hypothesis that most of the human ge- SNPs along the genome show complete LD nome has a block-like structure with an av- with each other even with longer distances erage LD block between few kb and 100kb (> 5kb) (Johnson et al. 2001). These tight- (Wall and Pritchard 2001). Hence, it was ly correlating SNPs are often called haplo- proposed that only few SNPs at each block type tagging SNPs (tagSNPs) as it is shown

REVIEW OF THE LITERATURE 21

NNikun06.inddikun06.indd 2121 224.9.20084.9.2008 116:41:096:41:09 that typing a few such tagSNPs allows to associated alleles in parental populations predict most other variants within the same (Chakraborty and Weiss 1988). Based on LD block (Johnson et al. 2001). Currently these observations and unique demograph- there are several approaches with congru- ic histories of the Finns and Saami, these ent results to identify tagSNPs (Chi et al. populations have often shown markedly 2006). These include: i) the identifi cation higher levels of extended LD compared to of LD blocks within the genomic region of other European populations (Varilo et al. interest, ii) the estimation of pairwise LD 1996; 2000; 2003, Laan et al. 1997; 2005, values within the LD block, and iii) the se- Kaessmann et al. 2002, May et al. 2002, lection of a few SNPs that capture most of Kauppi 2003, Johansson et al. 2005; 2007, the variation within the LD block (Carlson Service et al. 2006). In this context, LD et al. 2004). However, alternative methods has been successfully used for mapping to defi ne tagSNPs without LD block cri- monogenic diseases prevalent in the Finn- teria are also currently used (Halldorsson ish population (Hästbacka et al. 1992, de et al. 2004). More importantly, it has been la Chapelle and Wright 1998, Peltonen et shown that tagSNPs are often well-trans- al. 2000). The Saami have also been pro- ferable across populations at least within posed as a promising target population for continental regions (Gonzalez-Neira et al. LD drift mapping of complex traits (Ter- 2006, Mueller et al. 2005). williger et al. 1998, Kaessmann et al. 2002, In practise, if a marker (e.g. tagSNP) Ross et al. 2006). is in LD with a disease-causing allele, the strength of LD between the marker and the disease variants can be used to predict the 1.3. HUMAN GENOME DIVERSITY causal allele (Johnson et al. 2001). This AND THE HAPMAP PROJECT population-based LD mapping rests on the assumption that the disease causing muta- Shortly after the announcement of the Hu- tion stays linked with markers in its phys- man Genome Project (HGP), Cavalli-Sfor- ical vicinity for a certain amount of time za et al. (1991) proposed for a worldwide due to the slower decay of LD with tight- survey of the human genome variation ly linked markers (Lewontin and Kojima known as the Human Genome Diversity 1960, reviewed by Slatkin 2008). Moreover, Project (HGDP). The aim of this project recent observations have led to the hypoth- was to disentangle the structure and distri- esis that populations of small and constant bution of the genetic diversity in humans. size are ideal for LD mapping due to the Despite the diffi culties in ethical issues drift-enhanced disease and allele frequen- and criticism from scientists and indig- cy differences within a population between enous people (Greely 2001a), the HGDP the case and control samples (Terwilliger successfully collected and announced a et al. 1998). Similarly, admixture may of- worldwide sample set of 1064 individu- fer another effi cient approach for LD-map- als representing 52 populations from all ping using hybrid populations compared continents (HGDP CEPH cell line pan- to non-admixed populations (Chakraborty el, Cann et al. 2002). These samples have and Weiss 1988). However, the success since been used in a number of population of the admixture mapping depends heav- genetic studies and the results are continu- ily on the time since the admixture and the ously collected into a publicly available da- frequency differences of the disease and tabase (Cavalli-Sforza 2005).

22 REVIEW OF THE LITERATURE

NNikun06.inddikun06.indd 2222 224.9.20084.9.2008 116:41:096:41:09 The discovery of the punctuate LD sult the HapMap has characterized more along the human genome (Ardlie et al. than 4 million SNPs along the human ge- 2002, Gabriel et al. 2002) combined with nome, and recently completed phase II has the previous hypothesis of common dis- identifi ed additional 6 million SNPs (The ease/ common variant (Lander 1996, Reich International HapMap Consortium 2005; and Lander 2001) and the available high- 2007). Currently the project has been im- throughput genotyping methods boosted proved by the addition of more populations the foundation of the International Hap- (HapMap phase 3 data, www.hapmap.org). Map Project (The International HapMap Moreover, numerous fi ne-scale genom- Consortium 2003). The primary aims of ic analysis and genome-wide association the HapMap were i) to discover new ascer- studies have benefi tted from HapMap data tained SNPs across human genome, ii) to (Deloukas and Bentley 2004, McVean et characterize a genome-wide set of SNPs al. 2005). Despite recent criticism (Terwil- validated in four human populations and liger and Hiekkalinna 2006), the HapMap iii) to produce a common haplotype map project has already strongly contributed of the entire human genome using 269 to our quest for understanding the signifi - DNA samples from four ethnic human cance of the heritable genetic variation in groups (i.e. of African, European, Japa- modern humans and to disentangle the ge- nese and Han Chinese origin). The pri- netic variants relevant in complex traits of mary use of the common haplotype map human health and disease (Deloukas and is in whole genome association studies of Bentley 2004, McVean et al. 2005, The In- complex traits (The International HapMap ternational HapMap Consortium 2007). Consortium 2003). So far, as a phase I re-

REVIEW OF THE LITERATURE 23

NNikun06.inddikun06.indd 2323 224.9.20084.9.2008 116:41:096:41:09 2 AIMS OF THE PRESENT STUDY

In this thesis and the articles within I have explored the underlying molecular and pop- ulation genetic factors and processes shaping genetic variation. The main focus of this thesis has been the Finno-Ugric-speaking populations living in remote and relatively ex- treme geographic locations in North Eurasia.

Specifi cally I have focused on the following themes:

1) To study the genetic history and diversity of the Finno-Ugric-speaking populations by using uniparental markers (I, II).

2) To determine the prevalence and haplotype background of lactase persistence variant

C/T-13910 in North Eurasian populations (III)

3) To assess the recombination rate variation, haplotype structure and LD pattern within clinically signifi cant cytochrome P450 CYP2C and CYP2D gene subfamily regions in European populations including the North Eurasian Finno-Ugric-speaking Saami and Finns (IV)

24

NNikun06.inddikun06.indd 2424 224.9.20084.9.2008 116:41:096:41:09 3 MATERIALS AND METHODS

3.1 SAMPLES 3.2 MOLECULAR DATA

DNA samples consisted in total of 3119 To study the maternal neutral genetic di- healthy unrelated individuals of 53 human versity and evolutionary relationships of populations with informed consent. More- different North Eurasian human popula- over, a total of 5697 reference samples of tions, we assessed the mtDNA HVS-I and 42 Eurasian populations were obtained HVS-II region sequences between posi- from the literature. All these samples were tions 16024–16383 and 72–340, respective- used in the analysis but with differing sets ly. In addition, we analyzed seven mtDNA as described in the original publications (I– coding region SNP markers to confi rm the IV). It is noteworthy that our main interest observed mtDNA control region lineages concentrates on the North Eurasian Finno- (II). To assess the paternal neutral genet- Ugric-speaking population shown in detail ic diversity and dispersal among the North in Table 1 and also described in Pimenoff Eurasian populations, we used 17 Y-chro- and Sajantila (2002). mosome-specifi c SNP markers describing

Table 1. Finno-Ugric-speaking populations used in each study (I-IV)

Population n a Linguistic Geographic Subsistence Population References affiliation affiliation size b within

Finns 400 Finnic Northeast Agriculture 5,000,000 I, II,III,IV (Finno-Ugric) Europe

Saami 114 Finnic Northeast Reindeer 80,000 II,III,IV (Finno-Ugric) Europe breeding

Estonians 28 Finnic Northeast Agriculture 1,300,000 II (Finno-Ugric) Europe

Karelians 83 Finnic Northeast Agriculture 140,000 II (Finno-Ugric) Europe

Moksha 30 Volgaic Northeast Agriculture 380,000 II,III (Finno-Ugric) Europe

Erza 30 Volgaic Northeast Agriculture 760,000 II,III (Finno-Ugric) Europe

Udmurt 30 Permic Northeast Agriculture 640,000 II,III (Finno-Ugric) Europe

Komi 28 Permic Northeast Agriculture 340,000 II,III (Finno-Ugric) Europe

Khanty 106 Ugric Northwest Reindeer 21,000 II,III (Finno-Ugric) Siberia breeding

Mansi 161 Ugric Northwest Reindeer 8,000 II,III (Finno-Ugric) Siberia breeding

a Total amout of unrelated DNA samples used in this study b Laakso 1991, Kolga et al. 2001, Karafet et al. 2002

25

NNikun06.inddikun06.indd 2525 224.9.20084.9.2008 116:41:096:41:09 the paternal haplogroup distribution along the ␳-statistic along with mutation rates with 12 Y-chromosome specifi c microsat- from Forster et al. (1996) and Saillard et al. ellite markers, with four additional Y STRs (2000) were implemented (II). To defi ne analysed in the Finnish population (I, II). and test the uniparental phylogeographic For the haplotype analysis of lactase per- structures both spatial analysis of molecu-

sistence T-13910 allele among populations, lar variance (SAMOVA; Dupanloup et al. eight SNPs and one indel polymorphism 2002) and autocorrelation indices for DNA with minor allele frequencies MAF > 0.07 analysis (AIDA; Bertorelle and Barbujani distributed across a 30kb region of LCT 1995) were performed (II). Importantly, gene was used with additional sequences correlations between mtDNA and Y chro- (~ 700kb) fl anking the whole LCT gene re- mosome distance matrices (II) as well as

gion in particular individuals (III). To dis- between FST and population recombination entangle the allele and haplotype distribu- rate delta distances (IV) were estimated us- tion of clinically signifi cant cytochrome ing the Mantel test (Excoffi er et al. 2005). P450 CYP2C and CYP2D gene subfami- Allele frequencies and uniparental lineag- ly regions we used 55 and 97 SNP mark- es were also geographically visualized us- ers with MAF> 0.05 in dbSNP with a mean ing the MapView 6.0 program (StatSoftTM)

spacing of 7.8kb and 7.6kb, respectively (II). Moreover, pairwise FST and RST values (IV). All the genotyping methods are de- were visualized using multidimensional- scribed in detail in the original publications scaling (MDS) procedure implemented in (I-IV). the STATISTICA software package (Stat- SoftTM) (II, IV). For each population, auto- somal haplotypes were inferred separately 3.3 DATA ANALYSIS either using the Arlequin software (Excoffi - er et al. 2005, III) or PHASE v.2.1 software Population diversity indices, allele frequen- package with 1000 iterations (Stephens et cies, Hardy-Weinberg (HW) equilibrium, al. 2001, III–IV). Moreover, recombina-

and population pairwise FST- or RST-val- tion rate parameter ␳ was inferred separate- ues along with the exact test of population ly for each population and genomic region differentiation and the analysis of molecu- using software PHASE v.2.1 with 1000 it- lar variance (AMOVA) were estimated us- erations (Stephens et al. 2001, IV). Non- ing Arlequin software v3.0 (Excoffi er at al. parametric Spearman correlations between 2005) (I–IV). Phylogenetic median-joining population recombination estimates and networks were constructed using program Wilcoxon test for adjacent SNP r2-values package Network 4.5.0.0 (www.fl uxus- between populations were performed with techology.com) and when required locus SPSS 7.0 (IV). In addition, most of the au- weights described by Bandelt et al (2002) tosomal genotype data modifi cations were or Bosch et al (2006) were used (II, IV). performed prior analysis with Perl scripts To estimate the coalescence age of specif- and Perl 5.8.7 (IV). ic lineages within a uniparental network,

26 MATERIALS AND METHODS

NNikun06.inddikun06.indd 2626 224.9.20084.9.2008 116:41:106:41:10 4 RESULTS AND DISCUSSION

4.1 UNIPARENTAL GENETIC LANDSCAPE terns of North Eurasian maternal lineages IN NORTH EURASIA (I, II) (study II, Figure 4A), where Finno-Ugric- speaking populations form distinct clus- Mitochondrial and Y-chromosome stud- ters at an edge of the mtDNA haplogroup ies suggest that not only the Southwestern distribution. However, the geographical Europe (Semino et al. 2000, Torroni et al. pattern is not so clear within the Y-chro- 2001) but also Central Asia (Wells et al. mosome, except the clustering of Finno- 2001, Zerjal et al. 2002, Comas et al. 2004, Ugric and Samojedic populations togeth- Quintana-Murci et al. 2004) and South Si- er along with the Yakut population (study beria (Derenko et al. 2003; 2007ab) have II, Figure 4B). Similarly (study II, Figure

had an important role in the early settle- 3AB), mtDNA pairwise FST distances iden- ment of the modern humans into North tify Finno-Ugric-speakers as distinct clus- Eurasia. However, the genetic roots and ters between Northeast Europe and South dispersals of the North Eurasian Finno- Siberia/Central Asia, while the Y-chromo-

Ugric-speaking populations are not entire- some RST distances appeared less struc- ly clear (Cavalli-Sforza et al. 1994, Derbe- tured. Even, when the Erza, Moksha and neva et al. 2002, Karafet et al. 2002, Norio Udmurt Y-chromosomes (Pimenoff et al.

2003b, Ross et al. 2006). unpublished) are added, the RST distances To explore uniparental neutral varia- show the Finno-Ugric population totally tion among the North Eurasian Finno-Ug- dispersed with no clear structure. A mantel ric-speaking populations and situate them test between mtDNA and Y-chromosome into the North Eurasian genetic landscape, pairwise distances showed nonsignifi cant 42 ad 33 Eurasian mtDNA and Y chromo- correlation. some population samples were analyzed, Indeed, most of the Finno-Ugric-speak- respectively. In addition, Y chromosome ing populations showed to possess both STR haplotypes from 15 Eurasian popu- West and East Eurasian associated unipa- lations were used in further comparisons rental lineages (Figure 5AB, see also study (Pimenoff et al. unpublished data of 85 II, Figure 1AB). This unique amalgamation Finno-Ugric-speaking Erza, Moksha and of West and East Eurasian gene pools may Udmurt individuals were also included). indicate either mixed origin of these pop- In our analysis, geographically associ- ulations from genetically distinguishable ated uniparental haplotypes showed statis- Eastern and Western Eurasia or that North tically signifi cant frequency trends along Eurasia was initially colonized by humans the East-West axis of North Eurasia (study carrying both West and East Eurasian lin- II, Figure 1AB, 2). This is congruent with eages. Previous studies of the Saami and the current view of the clinal distribu- Finns support the idea of mixed origin in tion of West and East Eurasian uniparen- these populations (Norio et al. 2003b, Ross tal lineages (Richards et al. 2000, Semino et al. 2006, Johansson et al. 2006, Ingman et al. 2000, Underhill et al. 2000, Wells et and Gyllensten 2007). However, the Cen- al. 2001, Kivisild et al. 2002, Metspalu et tral Asian (Karafet et al. 2002, Comas et al. al. 2004, Rootsi et al. 2007). Correspon- 2004), Southwest Asian (Quintana-Mur- dence analysis also revealed east-west pat- ci et al. 2004) and South Siberian (Kara-

27

NNikun06.inddikun06.indd 2727 224.9.20084.9.2008 116:41:106:41:10 A

B

Figure 5. Distribution of the geographically associated A) mtDNA and B) Y-chromosome lineages among the Finno-Ugric-speaking Finns (fi n), Khanty (kha), Komi (kom), Mansi (man) and Saami (saa) populations. Colors for the associated haplogroups are the following: white, West Eurasian; gray, East Eurasian; black, South Asian (geographical classifi cation based on study II). Population abbrevia- tions refer to the same samples and abbreviations used in study II (Figure 1AB), except in 5B (saa2; Pimenoff et al. unpublished data).

28 RESULTS AND DISCUSSION

NNikun06.inddikun06.indd 2828 224.9.20084.9.2008 116:41:106:41:10 fet et al. 2002, Derenko et al. 2003; 2006; ed into six geographical subpopulations a 2007b) populations have shown an admix- clear local reduced heterogeneity in East- ture of West and East Eurasian lineages ern Finland and a signifi cant genetic differ- with higher overall genetic diversity com- ence between Western and Eastern Finland pared to Finno-Ugric-speakers. This sup- was observed (Figure 6; see also study I, ports the idea that the edge of North Eur- Table 3, Lappalainen et al. 2006, Palo et asia was colonized through Central Asia/ al. 2007). South Siberia by human groups already Apart from the unique genetic amal- carrying West and East Eurasian lineages. gamation and clinal distribution of west- Moreover, the observed mitochondrial U7 ern and eastern uniparental lineages, the and Y-chromosome N2 sublineages indi- North Eurasian Finno-Ugric-speaking cate a more recent gene fl ow probably from populations are genetically a heteroge- Central Asia to North Eurasia (Rootsi et al. neous group, mostly showing lower hap- 2007). In this context it could be explained lotype diversities among and within unipa- that the geographical region from Central rental haplogroups when compared to more Asia to Northeast Europe and Northwest southern populations (see also Karafet et Siberia has been a contact zone of genet- al. 2002, Derenko et al. 2003; 2006; 2007b, ically distinguishable Western and East- Comas et al. 2004). In a broader perspec- ern Eurasian lineages formed by recur- tive, the ensuing loss of genetic diversity ring migrations and admixture of distinct in populations living in Arctic and Boreal population groups. This unique admixture regions compared to more southern areas is currently seen in North Eurasian Finno- has been clearly demonstrated in a range Ugric-speaking populations of other species as well (Hewitt 2001). It is When the number of observed Y-chro- explained by the presence of southern ref- mosome STR haplotypes was divided by uges through the Last Glacial Maximum (c. the number of sampled individuals with- 13,000 years ago) and subsequent founder in a population, a lower fraction of hetero- effects, gene fl ow and genetic drift shaping geneity was observed in the Finns (43%), the genetic diversity of small population Khanty (43%), Mansi (52%) and Saa- groups migrating to the North Eurasia. mi (52%) compared to Finno-Ugric Erza (81%), Komi (69%), Moksha (86%) and Udmurt (67%) or most other more south- 4.2 DISTRIBUTION OF LACTASE ern populations. The difference in heteroge- PERSISTENCE ALLELE IN NORTH neity could be explained by the at least four EURASIA (III) times smaller population size (Table 1) and thus the greater sensitivity to genetic drift Most humans cannot digest lactose, i.e. in the Khanty, Mansi and Saami compared the main milk carbohydrate, after wean- to the Erza, Moksha, Komi and Udmurt ing due to the natural reduced activity of populations. However, in the Finns it is not the lactase-phlorizin hydrolase (LPH) en- the population size but a probable popula- zyme in intestinal cells (Sahi et al. 1973, tion bottleneck in the founding of the Finn- reviewed by Swallow 2003). These indi- ish population (Sajantila et al. 1996, Laher- viduals are considered as lactase non-per- mo et al. 1999), which explains the reduced sistent (LNP) [MIM223100]. However, Y-chromosome heterogeneity (Figure 6). some people maintain the LPH activity Moreover, when the samples were divid- throughout life, i.e. are lactase persistent

RESULTS AND DISCUSSION 29

NNikun06.inddikun06.indd 2929 224.9.20084.9.2008 116:41:106:41:10 Figure 6. Distribution of the most common Y-chromosome 16-loci STR haplotypes within six subpopu- lations of Finland (study I, Figure 3B).

(LP) [MIM223100], and thus can use milk shown to correlate completely with the LP/ products without metabolic diffi culty. In- LNP phenotype among the North Europe-

terestingly, among the North Eurasian and ans (Enattah et al. 2002). This T-13910 vari- some sub-Saharan African populations, of- ant correlating with LP is located 14kb up- ten with high dairy product consumption, stream of the LCT gene, which encodes for the lactase persistence is relatively com- LPH (Enattah et al. 2002). Previous hap- mon (> 80%), whereas the lactase non-per- lotype analysis showed that all LP alleles sistence is predominant among the rest of among Finns originated from one com- populations worldwide (Swallow 2003). mon ancestor indicating a single introduc-

Previously a single SNP C/T-13910 was tion of lactase persistence allele into North

30 RESULTS AND DISCUSSION

NNikun06.inddikun06.indd 3030 224.9.20084.9.2008 116:41:106:41:10 Europe (Enattah et al. 2002). It was also haplotypes carrying the T-13910 allele (Fig- confi rmed that the LNP is the ancestral ure 7). The fi rst cluster was observed only state in humans like in most mammals and among the Finno-Ugric Udmurts (15%), mutations causing LP have arisen proba- Mokshas (11%) and Erzas (5%) along bly due to a recent positive selection as an with Iranians (5%), while the second clus- adaptation to energy need in lactose rich ter of LP haplotypes including the domi- diet coincident with the animal domes- nant H98 LP haplotype was observed in tication (Hollox et al. 2001, Enattah et al. almost all populations (Figure 7; study III, 2002, Beja-Pereira et al. 2003, Bersaglieri Table 5). This observation of multiple LP et al. 2004, Myles et al. 2005, Tishkoff et haplotype clusters among the Finno-Ug- al. 2007). ric-speaking populations indicate two in-

To investigate the allelic background dependent origins of the LP T-13910 allele

of LP variant T-13910 in North Eurasia, we in North Eurasia. We also indentifi ed a genotyped eight SNPs and one indel poly- probable single LNP haplotype responsi-

morphism, including C/T-13910 variant, cov- ble for the background on which the most ering the ~30kb of the LCT region in 37 common major LP haplotype was derived. worldwide populations (study III, Figure This LNP background haplotype shows 1). Our results showed a high frequency the highest frequency among the Finno-

of LP T-13910 allele especially among the Ugric-speaking populations (between 33 Finno-Ugric-speaking Finns (58%), Ud- and 35%) along with Han Chinese (36%), murt (33%), Moksha (28%) and Erza which might indicate an East Eurasian or- (27%) (study III, Table 3). Interestingly, igin of the particular ancestral haplotype. the reindeer breeding Khanty (Ob-Ugric, However, the molecular and demographic 3%), Mansi (Ob-Ugric, 3%) and Saami factors may bias interpretation based pure-

(17%) exhibit the lowest LP T-13910 allele ly on population frequencies. The age esti-

frequencies among the Finno-Ugrics com- mates showed that the common LP T-13910 pared to the agriculturalists with the excep- haplotype cluster (12,000–5,000 BP) is

tion of the Komi (15%; N=10) (Table 1). older than the LP T-13910 haplotype cluster Furthermore, we identifi ed nine differ- (3,000–1,400 BP) mainly observed among

ent haplotypes carrying the T-13910 LP vari- the North Eurasian Finno-Ugric-speaking ant and 14 haplotypes with alleles carrying agriculturalist populations. This LP T-13910 the C-13910 LNP variant, each with a fre- allele and haplotype frequency distribu- quency of > 4% in at least one of the popu- tion in Finno-Ugric-speaking populations lations (study III, Table 5). One of the nine along with previous reports among the LP haplotypes dominate in LP alleles in sub-Saharan populations (Tishkoff et al. most populations including the Finno-Ug- 2007, Ingram et al. 2007) strongly imply

rics, in which the highest frequencies are that the LP T-13910 variant has been intro- among the agriculturalists. Six other LP duced independently more than once into haplotypes were also observed at the rea- North Eurasia. Moreover, the observed re- sonably frequency (between 2% and 11%) sults support the role of still-ongoing con- in the Finno-Ugric-speaking populations vergent evolution of the lactase persistence mostly among the agriculturalists. among the Finno-Ugrics in response to A median-joining network construct- adult milk consumption coincident with ed from these common haplotypes (MAF the change in subsistence at the edge of > 4%) revealed two distinct clusters of LP North Eurasia (Table 1).

RESULTS AND DISCUSSION 31

NNikun06.inddikun06.indd 3131 224.9.20084.9.2008 116:41:106:41:10 ion among 37 haplotypes with green color. haplotypes with green color. king populations discussed in the alleles within the haplotype are shown in box. The SNPs have been coded for each site as 1 -22018 and G/A -13910 MJ-network of common (MAF> 4%) LNP/LP haplotypes constructed from eight SNPs and one indel marker across the 30kb LCT gene reg MJ-network of common (MAF> ancestral SNP and 2 for the derived SNP. Figure 7. Figure The size of the circles are proportional to estimated haplotype frequencies. Haplotype frequencies for Finno-Ugric-spea text are shown. The positions of the C/T worldwide populations (study IV, Figure 5). Arrow denotes the root of network. LNP haplotypes are shown with yellow and LP worldwide populations (study IV,

32 RESULTS AND DISCUSSION

NNikun06.inddikun06.indd 3232 224.9.20084.9.2008 116:41:106:41:10 4.3 PATTERNS OF LD IN CYP2C AND To characterize the recombination rate CYP2D GENE SUBFAMILY REGIONS IN variation, LD distribution and haplotype EUROPE (IV) structure in the CYP2C and CYP2D re- gions we genotyped 144 SNPs across these The cytochrome P450 oxidase gene fam- two regions in Finno-Ugric-speaking Saa- ily comprises a set of evolutionary-relat- mi and Finns along with nine other Europe- ed genes that code for xenobiotic metabo- an and one African population (study IV). lism enzymes (Ingelman-Sundberg 2004). A further aim was to disentangle the past In humans, genes within the CYP2C molecular and population genetic process- and CYP2D regions of the cytochrome es responsible for the observed LD distri- P450 gene subfamily code for CYP2C8, bution, inferring from known differences CYP2C9, CYP2C18, CYP2C19 and in demographic history of Saami and Finns CYP2D6 drug-metabolizing enzymes compared to other European populations. (DMEs) (Wilkinson 2005). These genes In agreement with results obtained from are highly polymorphic with several known other marker analysis (Cavalli-Sforza et al. genetic variants associated to variable drug 1994, Ross et al. 2006, Lao et al. 2008), reactions of signifi cant clinical relevance the Finno-Ugric-speaking Saami and Finns (Lewis 2004). showed signifi cantly different CYP2C and

Figure 8. Multidimensional scaling (MDS) of population pairwise FST distances between 11 European populations across CYP2C and CYP2D gene regions (stress value = 0.073).

RESULTS AND DISCUSSION 33

NNikun06.inddikun06.indd 3333 224.9.20084.9.2008 116:41:116:41:11 A CYP2C18 CYP2C19 CYP2C9

CYP2D6 B

Figure 9. Recombination rate estimates across the A) CYP2C and B) CYP2D regions. Y axis is expressed in log scaled units of recombination rate (4Nr). The dash lines represent the upper and lower 95% confi dential intervals of the 10-fold average recombination rate among the European populations at either region. The position of genes is shown as horizontal black bars on top of each graph.

CYP2D allele frequencies from other Eu- The estimated patterns of recombina- ropean populations (Figure 8). For the rest tion rate variation revealed signifi cant but of the European populations including the lower correlation among European popu- CEPH sample representing the general Eu- lations compared to correlations observed ropean population as such in the HapMap between continental groups (Evans and project, the observed locus-specifi c and Cardon 2005). Regardless of the allele fre-

population pairwise FST-values indicate quency differences and recombination rate low degree of allele frequency differentia- heterogeneity among the studied popula- tion for the two cytochrome P450 regions. tions, the location and magnitude of de-

34 RESULTS AND DISCUSSION

NNikun06.inddikun06.indd 3434 224.9.20084.9.2008 116:41:116:41:11 A

B

Figure 10. The haplotype blocks identifi ed at CYP2C (A) and CYP2D (B) loci based on Gabriel et al. (2002) are shown in bars. Bars containing diagonal lines are those identifi ed within the extended LD region at both loci. Empty bars are LD blocks characterized outside the extended LD region. The posi- tion of genes is shown as horizontal black bars (only CYP2 genes identifi ed) below the depicted chro- mosome. Vertical arrows denote estimated recombination hotspots of R1-R3.

RESULTS AND DISCUSSION 35

NNikun06.inddikun06.indd 3535 224.9.20084.9.2008 116:41:116:41:11 A

B

Figure 11. Decay of r2 against the distance (kb) between marker pairs within the extended LD region defi ned as logarithmic best-fi t curves along A) CYP2C and B) CYP2D regions. Population abbrevia- tions are as reported in study IV (Table 1).

tected recombination hotspots R1–R3 are combination map inferred from the Hap- conserved in all 11 European populations Map CEPH data would be applicable for (Figure 9) and the African Mandenka pop- other European populations. However, the ulation. Interestingly, the CEPH European loci with lower recombination rate exhibit reference sample shows a very similar re- more variation in the rates between popula- combination profi le with other European tions indicating differences either in recom- populations suggesting that a fi ne-scale re- bination histories or in past demography. In

36 RESULTS AND DISCUSSION

NNikun06.inddikun06.indd 3636 224.9.20084.9.2008 116:41:126:41:12 more detail, the Saami shows the lowest cantly the shortest LD blocks and the low- recombination rate profi le for CYP2D but est proportion of high LD while the Saami not for CYP2C region, suggesting that re- show again the highest proportion of high combination rate estimates are also shaped LD (study IV, Table 1), signifi cantly differ- by other evolutionary factors than popula- ent pattern of adjacent marker r2 values and tion history alone (Figure 9). the longest single LD block (Figure 10B). In previous studies the extended LD The decay of LD within CYP2D region within the CYP2C region has shown to be shows the slowest decay in the Saami, but conserved across populations, although this is only seen in distances below 50kb the particular LD block structure is still as thereafter the decay is much faster in the under debate (Ahmadi et al. 2005, Walton Saami and Finns compared to other popu- et al. 2005, Vormfelde et al. 2007). Simi- lation. It is also noteworthy that the Saami larly, our data show a clear extended LD possess also the lowest number of CYP2C across the CYP2C region (Figure 10A) but and CYP2D haplotypes but higher CYP2D with a varying pattern of adjacent marker haplotype diversity compared to most oth- r2 values across the populations. Moreover, er Europeans (study IV, Table 1). the LD block structure is consistent across The Saami population has remained populations within CYP2C region with few small and constant in size throughout its exceptions (Figure 10A). Firstly, the sub- history and is considered to have an ad- Saharan Mandenka population shows sig- mixed West and East Eurasian genetic ori- nifi cantly the shortest LD blocks (Figure gin (Ross et al. 2006). The high and extend- 10 A) with lowest proportion of high LD ed LD with fl uctuating haplotype diversity (study IV, Table 1), which is due to the lon- in the Finno-Ugric-speaking Saami could ger period of recombination compared to be linked to the small and constant effec- European populations who migrated out of tive population size with admixture and/or Africa and experienced a demographic bot- subsequent long-term genetic drift shap- tleneck. Secondly, the CEPH sample shows ing the genetic diversity and LD compared a different pattern of LD block distribution to other European populations (Ross et al. (Figure 10A) and the lowest proportion of 2006). Both admixture and drift are shown high LD among Europeans (study IV, Ta- to generate enhanced but random patterns ble 1), while the Saami exhibit the highest of genetic diversity and LD (Terwilliger et proportion of high LD, signifi cantly differ- al. 1998, Ardlie et al. 2002). 2 ent adjacent marker r values and the lon- Moreover, a CYP2C19*2 A-19154 mu- gest single LD block among all popula- tation causing reduced activity for the tions (Figure 10A). Moreover, the decay of CYP2C19 DME shows signifi cantly higher high LD within the CYP2C extended LD allele frequency in Saami (95% CI: 28.1– region is slowest in the Saami and fastest 47.4%) compared to other European pop- in the Mandenka sample (or CEPH sample ulations (95% CI: 13.4–16.0%; Xie et al. within Europe) (Figure 11A). 1999). Interestingly, similar high frequen- Within the CYP2D region also a clear cies (≥ 0.30) of the altered activity allele are extended LD region is observed (Figure observed in Central and East Asia (Xiao et 10B), although the adjacent marker r2 val- al. 1997, Kimura et al. 1998, Wedlund et ues and the LD block distribution are more al. 2000, Niu et al. 2004). This, assuming heterogenous compared to CYP2C region. an East Eurasian origin of the CYP2C19*2

The sub-Saharan Mandenka show signifi - A-19154 mutation indicates signifi cant Asia

RESULTS AND DISCUSSION 37

NNikun06.inddikun06.indd 3737 224.9.20084.9.2008 116:41:126:41:12 contribution to the Saami gene pool simi- structure to tag haplotypes more effi ciently larly as observed in HLA markers (Johan- with larger genomic coverage compared to sson et al. 2008), although the combined other populations analyzed. This strategy effect of selection and drift may have en- of LD-based drift mapping originally pro-

hanced the level of A-19154 allele frequency. posed by Terwilliger et al (1998) may offer The Finno-Ugric-speaking Saami, based a great advantage for further identifi cation on the lower level recombination and high- of alleles associated to common complex er extended LD within the CYP2C and pharmacogenetic traits. CYP2D regions exhibit an optimal genetic

38 RESULTS AND DISCUSSION

NNikun06.inddikun06.indd 3838 224.9.20084.9.2008 116:41:126:41:12 5 CONCLUSIONS AND FUTURE PERSPECTIVES

The major aim of this thesis was to exam- denka population. From all studied popu- ine the origins and distribution of uniparen- lations the Saami showed also signifi cantly tal and autosomal genetic variation among the highest allele frequency of a CYP2C19 the Finno-Ugric-speaking human popula- gene mutation causing variable drug reac- tions living in Boreal and Arctic regions of tions. The diversity patterns observed with- North Eurasia. In more detail, I aimed to in CYP2C and CYP2D regions emphasize disentangle the underlying molecular and the strong effect of demographic history population genetic factors which have pro- shaping genetic diversity and LD especial- duced the patterns of uniparental and au- ly among such small and constant size pop- tosomal genetic diversity in these popu- ulations as the Finno-Ugric-speaking Saa- lations. Among Finno-Ugrics the genetic mi. Moreover, the increased LD in Saami amalgamation and clinal distribution of due to genetic drift and/or admixture was West and East Eurasian gene pools were shown to offer an advantage for further at- observed within uniparental markers. This tempts to identify alleles associated to com- admixture indicates that North Eurasia mon complex pharmacogenetic traits. was colonized through Central Asia/ South A challenge in future studies of human Siberia by human groups already carry- genome variation is to understand the mo- ing both West and East Eurasian lineages. lecular basis of common complex diseas- The complex combination of founder ef- es, and variable sensitivity to drugs, patho- fects, gene fl ow and genetic drift underly- gens and other environmental factors when ing the genetic diversity of the Finno-Ug- recent developments in genotyping and ge- ric-speaking populations were emphasized nomic resequencing have enabled the high- by low haplotype diversity within and throughput genome-wide studies such as among uniparental and biparental markers. the HapMap (International HapMap Con- A high prevalence of lactase persistence sortium 2007) and 1000 Genomes (Kaiser allele among the North Eurasian Finno- 2008). These studies aim primarily at vali- Ugric agriculturalist populations was also dating genetic variation without ascertain- shown indicating a local adaptation to ment bias, and secondarily to explore the subsistence change with lactose rich diet. evolutionary factor shaping genetic diver- Moreover, the haplotype background of sity. Both large scale genotyping projects lactase persistence allele among the Finno- and fi ne-scale resequencing studies of re- Ugric-speakers strongly suggested that the stricted genome regions assessed in differ-

lactase persistence T-13910 mutation was in- ent populations are needed for further re- troduced independently more than once to fi nements of the recombination hotspot and the North Eurasian gene pool. A signifi - LD block structures within the haplotype cant difference in genetic diversity, haplo- map of the human genome. A solid under- type structure and LD distribution within standing of the human genomic variation the cytochrome P450 CYP2C and CYP2D and haplotype structure within will enable regions revealed the unique gene pool of further determination of our evolutionary the Finno-Ugric Saami created mainly by past and enhance the identifi cation of ge- population genetic processes compared to netic variants underlying common complex other Europeans and sub-Saharan Man- traits and diseases.

39

NNikun06.inddikun06.indd 3939 224.9.20084.9.2008 116:41:126:41:12 6 ACKNOWLEDGEMENTS

This study was carried out between 2002– ond supervisor Professor David Comas for 2008 in the Department of Forensic Medi- a chance to work with him in ever enjoy- cine at the University of Helsinki and as a able atmosphere. Your valuable advice and visiting scientist in the Evolutionary Biol- broad knowledge in human population ge- ogy Unit at the University of Pompeu Fabra netics have kept me on a right track. More- along with a one and a half year vacation over, your never ending good humour and fulfi lling the National Military Service. social skills to survive with the whining The study was fi nancially supported by PhD student have always impressed me. The Finnish Cultural Foundation, the Fed- Professor Ulf Gyllensten and Profes- eration of European Biochemical Societies sor Pekka Pamilo deserve enormous com- and the Finnish Graduate School in Popu- pliments for carefully reviewing my thesis lation Genetics along with grants from the during their summer holiday season and EU and the University of Helsinki. for their valuable comments. I am also in- I wish to thank the former head of the depted to Professor Kimmo Kontula for Forensic Department, Professor Antti Pent- important advice to conduct my fi nal years tilä, and the new director Professor Erkki in PhD studies. Vuori, for providing the excellent research It has always been inspiring and chal- facilities with great academic atmosphere. lenging to work in the OLL-BIO laborato- I also wish to thank Professor Jaume Ber- ry at the Department of Forensic Medicine. tranpetit for inviting me to visit the inspir- I have learned everything about genotyp- ing Evolutionary Biology Unit at the Uni- ing and forensic laboratory work during versity of Pompeu Fabra. these years in Kytösuontie 11. Especially The deepest gratitude I wish to express I want to thank my colleague Minttu Hed- to my supervisor Professor Antti Sajanti- man, with whom I have not only shared an la at the Department of Forensic Medicine. offi ce and scientifi c papers but also mo- When I fi nally managed to meet up with ments of scientifi c joy while fi lling bureau- the world famous Finno-Ugric professor I cratic applications or glasses of wine. Juk- really got excited. Your personal enthusi- ka Palo is also greatly thanked for revising asm and deep knowledge concerning North most of my scientifi c writings and restless- Eurasian Finno-Ugric populations and hu- ly explaining to me the basics of popula- man population genetics in general hooked tion genetics. I also want to express my sin- me. Since then I have learned a lot from cere gratitude to the rest of the former or you about how to conduct scientifi c work. current oll-bio members: Antti L, Hanna, Your broad understanding and interest in Silvia, Johanna, Anna, Yukiko, Eve, Tei- science, genetics and life itself have con- ja, Kirsti, Pia, Helmuth, Katarina, Hannu stantly carried on and stimulated my own and Mikko. Thank you for all the great mo- sometimes restless and narrow mind. Your ments and good coffee breaks. patience with me has been unbelievable, I also had a great pleasure to take part especially when things have gone wrong. in the LD-EUROPE project and work with I have been very lucky to have had the op- great scientists: Laurent Excoffi er, Gil- portunity to work under your supervision. laume Lavall, Howard Cann, Sir Walter I am also greatly indebted to my sec- Bodmer, Susan Tonks, Irina Evseeva, Al-

40

NNikun06.inddikun06.indd 4040 224.9.20084.9.2008 116:41:126:41:12 berto Piazza, Francesca Crobu, Silvana your archaeological inspiration and intel- Santachiara-Benerecetti, Ornella Semi- ligent company while digging treasures in no, Anna Gonzalez-Neira and Claudia de eastern Finland. Risto and Pyry, you are Toma. Thank you for your collaboration. greatly acknowledged for keeping an old I also wish to thank all the bioevo-peo- guy sane in the Finnish forest. Juha and ple in Pompeu Fabra with whom I had the Aki, the two Hakunila musketeers, thank opportunity to enjoy scientifi cally inspiring you for sharing fantastic journeys and moments around the coffee machine—and private discussions. Gerald Matei (1974– obviously all the GFs in Bitacora: Monica, 2007), your never ending energy to get Roger, Stephanie, Ferran, Ixa, Urko, Elo- people to smile and enjoy life are greatly die, Hafi d, Araceli, Chiara, Andrés, Karla, missed. And for Sampsa and Petri – thank Michelle, Gemma, Carles, Josep, Ricardo, you for your friendship. I also wish to Marta, Belen, Anna F, Anna R, Olga, Rui, thank all relatives. Especially Konsta, your Begoña, Oscar, Elena, Fracesc and Arcadi. visual eye and hilarious humour is more Thank you and hope to see you soon. than greatly thanked during the printing of Special BCN thanks go to Martin and this thesis. Renia for being superkind hosts for home- A warmest gratitude and sincere thanks less Finns and always being eager to share are due to my family. Dear Heli and Georg. common weekend adventures. Bruno and Thank you for all your mental and physical Jordi. It’s always a joy even just to talk care. And do not worry, as your prodigal with you and enjoy life while watching op- son will return. Vilma and Wander. Thank eración triumfo. you for all the late dinners and great mo- Numerous friends in Capoeira Forca ments, which have kept me going. Natural are greatly acknowledged for vari- Agnes “Oma”, dear grandmother (1910– ous relaxing moments and aching muscles 2008). I wish you would be here to share a outside the scientifi c world. Antti Korpi- cup of tea with lemon. saari and Jussi Korhonen, thank you for

ACKNOWLEDGEMENTS 41

NNikun06.inddikun06.indd 4141 224.9.20084.9.2008 116:41:126:41:12 7 REFERENCES

Abondolo D (1998) The Uralic Languages. Lon- Beck S, Trowsdale J (2000) The human major don and New York histocompatability complex: Lessons from Ahmadi KR, Weale ME, Xue ZY, Soranzo N, Yar- the DNA sequence. Annu Rev Genomics Hum nall DP, Briley JD, Maruyama Y, Kobayashi M, Genet 1:117–137 Wood NW, Spurr NK, Burns DK, Roses AD, Beja-Pereira A, Luikart G, England PR, Brad- Saunders AM, Goldstein DB (2005) A single- ley DG, Jann OC, Bertorelle G, Chamberlain nucleotide polymorphism tagging set for hu- AT, Nunes TP, Metodiev S, Ferrand N, Erhardt man drug metabolism and transport. Nat Genet G (2003) Gene-culture coevolution between 37:84–89 cattle milk protein genes and human lactase Akey JM, Zhang G, Zhang K, Jin L, Shriver MD genes. Nat Genet 35:311–313 (2002) Interrogating a high-density SNP map Bergman I (2004) Deglaciation and colonization: for signatures of natural selection. Genome Pioneer settlements in Northern Fennoscandia. Res 12:1805–1814 Journal of World Prehistory:155–177 Anderson S, Bankier AT, Barrell BG, de Bruijn Bersaglieri T, Sabeti PC, Patterson N, Vander- MH, Coulson AR, Drouin J, Eperon IC, Nier- ploeg T, Schaffner SF, Drake JA, Rhodes M, lich DP, Roe BA, Sanger F, Schreier PH, Smith Reich DE, Hirschhorn JN (2004) Genetic sig- AJ, Staden R, Young IG (1981) Sequence and natures of strong recent positive selection at organization of the human mitochondrial ge- the lactase gene. Am J Hum Genet 74:1111– nome. Nature 290:457–465 1120 Andrews RM, Kubacka I, Chinnery PF, Lightowl- Bertorelle G, Barbujani G (1995) Analysis of ers RN, Turnbull DM, Howell N (1999) Re- DNA diversity by spatial autocorrelation. Ge- analysis and revision of the Cambridge refer- netics 140:811–819 ence sequence for human mitochondrial DNA. Bertorelle G, Excoffi er L (1998) Inferring ad- Nat Genet 23:147 mixture proportions from molecular data. Mol Ardlie KG, Kruglyak L, Seielstad M (2002) Pat- Biol Evol 15:1298–1311 terns of linkage disequilibrium in the human Bertranpetit J, Calafell F, Comas D, Gonzalez- genome. Nat Rev Genet 3:299–309 Neira A, Navarro A (2003) Structure of link- Armour JA, Anttinen T, May CA, Vega EE, Sajan- age disequilibrium in humans: Genome fac- tila A, Kidd JR, Kidd KK, Bertranpetit J, Paa- tors and population stratifi cation. Cold Spring bo S, Jeffreys AJ (1996) Minisatellite diversi- Harb Symp Quant Biol 68:79–88 ty supports a recent african origin for modern Bosch E, Calafell F, Gonzalez-Neira A, Flaiz C, humans. Nat Genet 13:154–160 Mateu E, Scheil HG, Huckenbeck W, Efre- Arnheim N, Calabrese P, Nordborg M (2003) Hot movska L, Mikerezi I, Xirotiris N, Grasa C, and cold spots of recombination in the human Schmidt H, Comas D (2006) Paternal and genome: The reason we should fi nd them and maternal lineages in the Balkans show a ho- how this can be achieved. Am J Hum Genet mogeneous landscape over linguistic barriers, 73:5–16 except for the isolated Aromuns. Ann Hum Bandelt HJ, Quintana-Murci L, Salas A, Macaul- Genet 70:459–487 ay V (2002) The fi ngerprint of phantom muta- Brinkmann B, Klintschar M, Neuhuber F, Huh- tions in mitochondrial DNA data. Am J Hum ne J, Rolf B (1998) Mutation rate in human Genet 71:1150–1160 microsatellites: Infl uence of the structure and Bandelt HJ, Richards M, Macaulay V (eds) length of the tandem repeat. Am J Hum Genet (2006) Human Mitochondrial DNA and the 62:1408–1415 Evolution of Homo Sapiens. Springer, Berlin Bustamante CD, Fledel-Alon A, Williamson S, Heidelberg New York Nielsen R, Hubisz MT, Glanowski S, Tanen- Barbujani G, Magagni A, Minch E, Cavalli-Sfor- baum DM, White TJ, Sninsky JJ, Hernandez za LL (1997) An apportionment of human RD, Civello D, Adams MD, Cargill M, Clark DNA diversity. Proc Natl Acad Sci U S A AG (2005) Natural selection on protein-cod- 94:4516–4519 ing genes in the human genome. Nature 437:1153–1157

42 REFERENCES

NNikun06.inddikun06.indd 4242 224.9.20084.9.2008 116:41:126:41:12 Cann HM, de Toma C, Cazes L, Legrand MF, Comas D, Plaza S, Wells RS, Yuldaseva N, Lao Morel V, Piouffre L, Bodmer J et al. (2002) A O, Calafell F, Bertranpetit J (2004) Admix- human genome diversity cell line panel. Sci- ture, migrations, and dispersals in central asia: ence 296:261–262 Evidence from maternal DNA lineages. Eur J Cann RL, Stoneking M, Wilson AC (1987) Mito- Hum Genet 12:495–504 chondrial DNA and human evolution. Nature Conrad DF, Jakobsson M, Coop G, Wen X, Wall 325:31–36 JD, Rosenberg NA, Pritchard JK (2006) A Carlson CS, Eberle MA, Kruglyak L, Nickerson worldwide survey of haplotype variation and DA (2004) Mapping complex disease loci in linkage disequilibrium in the human genome. whole-genome association studies. Nature Nat Genet 38:1251–1260 429:446–452 Cooper A, Poinar HN (2000) Ancient DNA: Do it Carroll SB (2003) Genetics and the making of right or not at all. Science 289:1139 Homo sapiens. Nature 422:849–857 Crawford DC, Bhangale T, Li N, Hellenthal G, Cavalli-Sforza LL (2005) The human genome di- Rieder MJ, Nickerson DA, Stephens M (2004) versity project: Past, present and future. Nat Evidence for substantial fi ne-scale variation Rev Genet 6:333–340 in recombination rates across the human ge- Cavalli-Sforza LL, Menozzi P, Piazza A (1994) nome. Nat Genet 36:700–706 The History and Geography of Human Genes. Daly MJ, Rioux JD, Schaffner SF, Hudson TJ, Princeton, NJ Princeton University Press Lander ES (2001) High-resolution haplotype Cavalli-Sforza LL, Wilson AC, Cantor CR, Cook- structure in the human genome. Nat Genet Deegan RM, King MC (1991) Call for a world- 29:229–232 wide survey of human genetic diversity: A Darwin CR (1859) The Origin of Species by vanishing opportunity for the human genome Means of Natural Selection. John Murray, project. Genomics 11:490–491 London. Chakraborty R, Weiss KM (1988) Admixture de Knijff P (2000) Messages through bottlenecks: as a tool for fi nding linked genes and de- On the combined use of slow and fast evolv- tecting that difference from allelic associa- ing polymorphic markers on the human Y chro- tion between loci. Proc Natl Acad Sci U.S.A. mosome. Am J Hum Genet 67:1055–1061 85:9119–9123 de la Chapelle A, Wright FA (1998) Linkage dis- Chi PB, Duggal P, Kao WH, Mathias RA, Grant equilibrium mapping in isolated populations: AV, Stockton ML, Garcia JG, Ingersoll RG, The example of Finland revisited. Proc Natl Scott AF, Beaty TH, Barnes KC, Fallin MD Acad Sci U.S.A.95:12416–12423 (2006) Comparison of SNP tagging methods Deloukas P, Bentley D (2004) The HapMap proj- using empirical data: Association study of 713 ect and its application to genetic studies of SNPs on chromosome 12q14.3–12q24.21 for drug response. Pharmacogenomics J 4:88–90 asthma and total serum IgE in an African Ca- Denoeud F, Vergnaud G, Benson G (2003) Pre- ribbean population. Genet Epidemiol 30:609– dicting human minisatellite polymorphism. 619 Genome Res 13:856–867 Chimpanzee Sequencing and Analysis Consor- Derbeneva OA, Starikovskaya EB, Wallace DC, tium (2005) Initial sequence of the chimpan- Sukernik RI (2002) Traces of early Eurasians zee genome and comparison with the human in the Mansi of Northwest Siberia revealed by genome. Nature 437:69–87 mitochondrial DNA analysis. Am J Hum Gen- Clark AG (1990) Inference of haplotypes from et 70:1009–1014 PCR-amplifi ed samples of diploid popula- Derenko M, Malyarchuk B, Denisova GA, tions. Mol Biol Evol 7:111–122 Wozniak M, Dambueva I, Dorzhu C, Luzina F, Clark AG (2004) The role of haplotypes in can- Miscicka-Sliwka D, Zakharov I (2006) Con- didate gene studies. Genet Epidemiol 27:321– trasting patterns of Y-chromosome variation in 333 South Siberian populations from Baikal and Collins FS, Green ED, Guttmacher AE, Guyer Altai-Sayan regions. Hum Genet 118(5):591– MS, US National Human Genome Research 604 Institute (2003) A vision for the future of ge- Derenko M, Malyarchuk B, Denisova G, Wozniak nomics research. Nature 422:835–847 M, Grzybowski T, Dambueva I, Zakharov I (2007a) Y-chromosome haplogroup N disper- sals from South Siberia to Europe. J Hum Gen- et 52:763–770

REFERENCES 43

NNikun06.inddikun06.indd 4343 224.9.20084.9.2008 116:41:136:41:13 Derenko M, Malyarchuk B, Grzybowski T, Den- Forster P, Harding R, Torroni A, Bandelt HJ isova G, Dambueva I, Perkova M, Dorzhu (1996) Origin and evolution of native Amer- C, Luzina F, Lee HK, Vanecek T, Villems R, ican mtDNA variation: A reappraisal. Am J Zakharov I (2007b) Phylogeographic analysis Hum Genet 59:935–945 of mitochondrial DNA in Northern Asian pop- Gabriel SB, Schaffner SF, Nguyen H, Moore JM, ulations. Am J Hum Genet 81:1025–1041 Roy J, Blumenstiel B, Higgins J, DeFelice M, Derenko MV, Grzybowski T, Malyarchuk BA, Lochner A, Faggart M, Liu-Cordero SN, Roti- Dambueva IK, Denisova GA, Czarny J, Dor- mi C, Adeyemo A, Cooper R, Ward R, Lander zhu CM, Kakpakov VT, Miscicka-Sliwka D, ES, Daly MJ, Altshuler D (2002) The struc- Wozniak M, Zakharov IA (2003) Diversity of ture of haplotype blocks in the human genome. mitochondrial DNA lineages in South Siberia. Science 296:2225–2229 Ann Hum Genet 67:391–411 Gibson J, Morton NE, Collins A (2006) Extended Ding K, Zhou K, Zhang J, Knight J, Zhang X, tracts of homozygosity in outbred human pop- Shen Y (2005) The effect of haplotype-block ulations. Hum Mol Genet 15:789–795 defi nitions on inference of haplotype-block Goebel T, Derevianko AP, Petrin VT (1993) Dat- structure and htSNPs selection. Mol Biol Evol ing the middle-to-upper-paleolithic transition 22:148–159 at Kara-Bom. Curr Anthropol 34:452–458 Dolukhanov PM, Shukurov AM, Tarasov PE, Goldstein DB (2001) Islands of linkage disequi- Zaitseva GI (2002) Colonization of North- librium. Nat Genet 29:109–111 ern Eurasia by modern humans: Radiocarbon Goldstein DB, Cavalleri GL (2005) Genom- chronology and environment. Journal of Ar- ics: Understanding human diversity. Nature chaeological Science 29:593–606 437:1241–1242 Dupanloup I, Schneider S, Excoffi er L (2002) A Gonzalez-Neira A, Ke X, Lao O, Calafell F, Na- simulated annealing approach to defi ne the varro A, Comas D, Cann H, Bumpstead S, genetic structure of populations. Mol Ecol Ghori J, Hunt S, Deloukas P, Dunham I, Car- 11:2571–2581 don LR, Bertranpetit J (2006) The portability Ellegren H (2004) Microsatellites: Simple se- of tagSNPs across populations: A worldwide quences with complex evolution. Nat Rev survey. Genome Res 16:323–330 Genet 5:435–445 Greely HT (2001a) Human genome diversity: Enattah NS, Sahi T, Savilahti E, Terwilliger JD, What about the other human genome project? Peltonen L, Jarvela I (2002) Identifi cation of Nat Rev Genet 2:222–227 a variant associated with adult-type hypolacta- Greely HT (2001b) Informed consent and other sia. Nat Genet 30:233–237 ethical issues in human population genetics. Evans DM, Cardon LR (2005) A comparison of Annu Rev Genet 35:785–800 linkage disequilibrium patterns and estimat- Green RE, Krause J, Ptak SE, Briggs AW, Ronan ed population recombination rates across mul- MT, Simons JF, Du L, Egholm M, Rothberg tiple populations. Am J Hum Genet 76:681– JM, Paunovic M, Paabo S (2006) Analysis of 687 one million base pairs of Neanderthal DNA. Excoffi er L, Laval G, Schneider S (2005) Arle- Nature 444:330–336 quin ver. 3.0: An integrated software package Greenberg JH (2000) Indo-European and its Co- for population genetics data analysis. Evolu- sest Relatives; the Eurasiatic Language Family. tionary Bioinformatics Online 1:47–50 Stanford, Stanford University Press Volume 1, Excoffi er L, Slatkin M (1995) Maximum-likeli- Grammar hood estimation of molecular haplotype fre- Guglielmino CR, Piazza A, Menozzi P, Cavalli- quencies in a diploid population. Mol Biol Sforza LL (1990) Uralic genes in Europe. Am Evol 12:921–927 J Phys Anthropol 83:57–68 Finnila S, Lehtonen MS, Majamaa K (2001) Phy- Haldane JBS (1924) A mathematical theory of logenetic network for European mtDNA. Am J natural and artifi cial selection. Transactions Hum Genet 68:1475–1484 of the Cambridge philosophical society Part Fisher RA (1930) The Genetical Theory of Natu- I:19–41 ral Selection. Clarendon Press, Oxford, Oxford Halldorsson BV, Bafna V, Lippert R, Schwartz R, University Press, Oxford De La Vega FM, Clark AG, Istrail S (2004) Optimal haplotype block-free selection of tag- ging SNPs for genome-wide association stud- ies. Genome Res 14:1633–1640

44 REFERENCES

NNikun06.inddikun06.indd 4444 224.9.20084.9.2008 116:41:136:41:13 Hastbacka J, de la Chapelle A, Kaitila I, Sistonen Ingram CJ, Elamin MF, Mulcare CA, Weale P, Weaver A, Lander E (1992) Linkage dis- ME, Tarekegn A, Raga TO, Bekele E, Elamin equilibrium mapping in isolated founder popu- FM, Thomas MG, Bradman N, Swallow DM lations: Diastrophic dysplasia in Finland. Nat (2007) A novel polymorphism associated with Genet 2:204–211 lactose tolerance in Africa: Multiple causes Hedman M, Brandstatter A, Pimenoff V, Sistonen for lactase persistence? Hum Genet 120:779– P, Palo JU, Parson W, Sajantila A (2007) Finn- 788 ish mitochondrial DNA HVS-I and HVS-II International HapMap Consortium (2003) The in- population data. Forensic Sci Int 172:171– ternational HapMap project. Nature 426:789– 178 796 Hewitt GM (1999) Post-glacial recolonization of International HapMap Consortium (2005) A European biota. Biological Journal of the Lin- haplotype map of the human genome. Nature nean Society:87–112 437:1299–1320 Hewitt GM (2001) Speciation, hybrid zones and International HapMap Consortium, Frazer KA, phylogeography – or seeing genes in space Ballinger DG, Cox DR, Hinds DA, Stuve LL, and time. Mol Ecol 10:537–549 Gibbs RA et al. (2007) A second generation Heyer E, Puymirat J, Dieltjes P, Bakker E, de human haplotype map of over 3.1 million Knijff P (1997) Estimating Y chromosome SNPs. Nature 449:851–861 specifi c microsatellite mutation frequencies International Human Genome Sequencing Con- using deep rooting pedigrees. Hum Mol Gen- sortium (2004) Finishing the euchromatic se- et 6:799–803 quence of the human genome. Nature 431:931– Hill WG, Weir BS (1994) Maximum-likelihood 945 estimation of gene location by linkage dis- Jeffreys AJ, Kauppi L, Neumann R (2001) In- equilibrium. Am J Hum Genet 54:705–714 tensely punctate meiotic recombination in the Hinds DA, Stuve LL, Nilsen GB, Halperin E, class II region of the major histocompatibility Eskin E, Ballinger DG, Frazer KA, Cox DR complex. Nat Genet 29:217–222 (2005) Whole-genome patterns of common Jeffreys AJ, Neumann R (2002) Reciprocal cross- DNA variation in three human populations. over asymmetry and meiotic drive in a human Science 307:1072–1079 recombination hot spot. Nat Genet 31:267– Hollox EJ, Poulter M, Zvarik M, Ferak V, Krause 271 A, Jenkins T, Saha N, Kozlov AI, Swallow DM Jeffreys AJ, Ritchie A, Neumann R (2000) High (2001) Lactase haplotype diversity in the old resolution analysis of haplotype diversity and world. Am J Hum Genet 68:160–172 meiotic crossover in the human TAP2 recom- Howell N, Oostra RJ, Bolhuis PA, Spruijt L, bination hotspot. Hum Mol Genet 9:725–733 Clarke LA, Mackey DA, Preston G, Herrn- Jobling MA, Hurles M, Tyler-Smith C (2003) Hu- stadt C (2003) Sequence analysis of the mito- man evolutionary genetics: Origins, peoples chondrial genomes from Dutch pedigrees with and disease. Garland Science Leber hereditary optic neuropathy. Am J Hum Jobling MA, Pandya A, Tyler-Smith C (1997) The Genet 72:1460–1469 Y chromosome in forensic analysis and pater- Ingelman-Sundberg M (2004) Human drug me- nity testing. Int J Legal Med 110:118–124 tabolising cytochrome P450 enzymes: Proper- Jobling MA, Tyler-Smith C (2003) The human Y ties and polymorphisms. Naunyn Schmiede- chromosome: An evolutionary marker comes bergs Arch Pharmacol 369:89–104 of age. Nat Rev Genet 4:598–612 Ingman M, Gyllensten U (2007) A recent genetic Johansson A, Ingman M, Mack SJ, Erlich H, Gyl- link between Sami and the -Ural region lensten U (2008) Genetic origin of the Swed- of Russia. Eur J Hum Genet 15:115–120 ish Sami inferred from HLA class I and class Ingman M, Kaessmann H, Paabo S, Gyllensten II allele frequencies. Eur J Hum Genet (in U (2000) Mitochondrial genome variation press) and the origin of modern humans. Nature Johansson A, Vavruch-Nilsson V, Cox DR, Fraz- 408:708–713 er KA, Gyllensten U (2007) Evaluation of the SNP tagging approach in an independent pop- ulation sample--array-based SNP discovery in Sami. Hum Genet 122:141–150

REFERENCES 45

NNikun06.inddikun06.indd 4545 224.9.20084.9.2008 116:41:136:41:13 Johansson A, Vavruch-Nilsson V, Edin-Liljegren Kayser M, Kittler R, Erler A, Hedman M, Lee AC, A, Sjolander P, Gyllensten U (2005) Linkage Mohyuddin A, Mehdi SQ, Rosser Z, Stonek- disequilibrium between microsatellite mark- ing M, Jobling MA, Sajantila A, Tyler-Smith C ers in the Swedish Sami relative to a world- (2004) A comprehensive survey of human Y- wide selection of populations. Hum Genet chromosomal microsatellites. Am J Hum Gen- 116:105–113 et 74:1183–1197 Johnson GC, Esposito L, Barratt BJ, Smith AN, Kayser M, Roewer L, Hedman M, Henke L, Hen- Heward J, Di Genova G, Ueda H, Cordell HJ, ke J, Brauer S, Kruger C, Krawczak M, Nagy Eaves IA, Dudbridge F, Twells RC, Payne M, Dobosz T, Szibor R, de Knijff P, Stoneking F, Hughes W, Nutland S, Stevens H, Carr P, M, Sajantila A (2000) Characteristics and fre- Tuomilehto-Wolf E, Tuomilehto J, Gough SC, quency of germline mutations at microsatel- Clayton DG, Todd JA (2001) Haplotype tag- lite loci from the human Y chromosome, as re- ging for the identifi cation of common disease vealed by direct observation in father/son pairs. genes. Nat Genet 29:233–237 Am J Hum Genet 66:1580–1588 Jorde LB (2000) Linkage disequilibrium and the Kelley JL, Madeoy J, Calhoun JC, Swanson W, search for complex disease genes. Genome Akey JM (2006) Genomic signatures of posi- Res 10:1435–1444 tive selection in humans and the limits of out- Jorde LB, Rogers AR, Bamshad M, Watkins WS, lier approaches. Genome Res 16:980–989 Krakowiak P, Sung S, Kere J, Harpending HC Kere J (2001) Human population genetics: Les- (1997) Microsatellite diversity and the demo- sons from Finland. Annu Rev Genomics Hum graphic history of modern humans. Proc Natl Genet 2:103–128 Acad Sci U S A 94:3100–3103 Khusnutdinova EK, Viktorova TV, Fatkhlislamo- Jorde LB, Watkins WS, Bamshad MJ, Dixon ME, va RI, Khidiatova IM (1999) Restriction poly- Ricker CE, Seielstad MT, Batzer MA (2000) morphism of the major non-decoding region The distribution of human genetic diversity: of mitochondrial DNA in human populations A comparison of mitochondrial, autosomal, from the Volga-Ural region. Genetika 35:695– and Y-chromosome data. Am J Hum Genet 702 66:979–988 Kidd KK, Pakstis AJ, Speed WC, Kidd JR (2004) Kaessmann H, Paabo S (2002) The genetical his- Understanding human DNA sequence varia- tory of humans and the great apes. J Intern tion. J Hered 95:406–420 Med 251:1–18 Kim Y, Stephan W (2002) Detecting a local sig- Kaessmann H, Zollner S, Gustafsson AC, Wie- nature of genetic hitchhiking along a recom- be V, Laan M, Lundeberg J, Uhlen M, Paabo bining chromosome. Genetics 160:765–777 S (2002) Extensive linkage disequilibrium Kimura M (1968) Evolutionary rate at the molec- in small human populations in Eurasia. Am J ular level. Nature 217:624–626 Hum Genet 70:673–685 Kimura M, Ieiri I, Mamiya K, Urae A, Higuchi S Kaiser J (2008) DNA sequencing. A plan to cap- (1998) Genetic polymorphism of cytochrome ture human diversity in 1000 genomes. Sci- P450s, CYP2C19, and CYP2C9 in a Japanese ence 319:395 population. Ther Drug Monit 20:243–247 Karafet TM, Osipova LP, Gubina MA, Posukh Kittles RA, Bergen AW, Urbanek M, Virkkunen OL, Zegura SL, Hammer MF (2002) High lev- M, Linnoila M, Goldman D, Long JC (1999) els of Y-chromosome differentiation among na- Autosomal, mitochondrial, and Y chromo- tive Siberian populations and the genetic sig- some DNA variation in Finland: Evidence for nature of a boreal hunter-gatherer way of life. a male-specifi c bottleneck. Am J Phys Anthro- Hum Biol 74:761–789 pol 108:381–399 Kauppi L, Sajantila A, Jeffreys AJ (2003) Re- Kittles RA, Perola M, Peltonen L, Bergen AW, combination hotspots rather than population Aragon RA, Virkkunen M, Linnoila M, Gold- history dominate linkage disequilibrium in man D, Long JC (1998) Dual origins of Finns the MHC class II region. Hum Mol Genet revealed by Y chromosome haplotype varia- 12:33–40 tion. Am J Hum Genet 62:1171–1179 Kivisild T, Tolk HV, Parik J, Wang Y, Papiha SS, Bandelt HJ, Villems R (2002) The emerging limbs and twigs of the East Asian mtDNA tree. Mol Biol Evol 19:1737–1751

46 REFERENCES

NNikun06.inddikun06.indd 4646 224.9.20084.9.2008 116:41:136:41:13 Kolga M, Tõnurist I, Vaba L, Viikberg J (2001) Laitinen V, Lahermo P, Sistonen P, Savontaus The Red Book of the Peoples of the Russian ML (2002) Y-chromosomal diversity suggests Empire. Tallinn, NGO Red Book that Baltic males share common Finno-Ugric- Kong A, Gudbjartsson DF, Sainz J, Jonsdottir speaking forefathers. Hum Hered 53:68–78 GM, Gudjonsson SA, Richardsson B, Sigur- Lander ES (1996) The new genomics: Global dardottir S, Barnard J, Hallbeck B, Masson G, views of biology. Science 274:536–539 Shlien A, Palsson ST, Frigge ML, Thorgeirs- Lander ES, Linton LM, Birren B, Nusbaum C, son TE, Gulcher JR, Stefansson K (2002) A Zody MC, Baldwin J, Devon K et al. (2001) high-resolution recombination map of the hu- Initial sequencing and analysis of the human man genome. Nat Genet 31:241–247 genome. Nature 409:860–921 Krebs CJ (1994) Ecology: The experimental anal- Landsteiner K (1931) Individual differences in ysis of distribution and abundance. human blood. Science 73:403–409 Kruglyak L (1999) Prospects for whole-genome Lao O, Lu TT, Nothnagel M, Junge O, Freitag- linkage disequilibrium mapping of common Wolf S, Caliebe A, Balascakova M, Bertran- disease genes. Nat Genet 22:139–144 petit J, Bindoff LA, Comas D, Holmlund G, Kruglyak L, Nickerson DA (2001) Variation is Kouvatsi A, Macek M, Mollet I, Parson W, the spice of life. Nat Genet 27:234–236 Palo J, Ploski R, Sajantila A, Tagliabracci A, Kryukov GV, Pennacchio LA, Sunyaev SR Gether U, Werge T, Rivadeneira F, Hofman A, (2007) Most rare missense alleles are deleteri- Uitterlinden AG, Gieger C, Wichmann HE, ous in humans: Implications for complex dis- Rüther A, Schreiber S, Becker C, Nürnberg P, ease and association studies. Am J Hum Genet Nelson MR, Krawczak M, Kayser M (2008) 80:727–739 Correlation between genetic and geographic Kutuev I, Khusainova R, Karunas A, Yunus- structure in Europe.Curr Biol. 18:1241–1248 bayev B, Fedorova S, Lebedev Y, Hunsmann Lappalainen T, Koivumaki S, Salmela E, Huo- G, Khusnutdinova E (2006) From east to west: ponen K, Sistonen P, Savontaus ML, Laher- Patterns of genetic diversity of populations mo P (2006) Regional differences among the living in four Eurasian regions. Hum Hered Finns: A Y-chromosomal perspective. Gene 61:1–9 376:207–215 Kuzmin YV, Keates SG: Comment on “Coloniza- Lewis DF (2004) 57 varieties: The human cy- tion of Northern Eurasia by Modern Humans: tochromes P450. Pharmacogenomics 5:305– Radiocarbon Chronology and Environment” 318 by P.M. Dolukhanov, A.M. Shukurov, P.E. Lewontin RC (1964) The interaction of selection Tarasov and G.I. Zaitseva. Journal of Archaeo- and linkage. I. general considerations; heterot- logical Science 29, 593–606 (2002). Journal of ic models. Genetics 49:49–67 Archaeological Science, 2004; 31: 141–143 Lewontin RC (1972) The apportionment of hu- Laakso J (1992) Uralilaiset kansat. Tietoa suomen man diversity. Evol Biol 6:381–398 sukukielistä ja niiden puhujista. WSOY, Juva Lewontin RC (1988) On measures of gametic dis- Laan M, Paabo S (1997) Demographic history equilibrium. Genetics 120:849–852 and linkage disequilibrium in human popula- Lewontin RC, Kojima K (1960) The evolutionary tions. Nat Genet 17:435–438 dynamics of complex polymorphisms. Evolu- Laan M, Wiebe V, Khusnutdinova E, Remm M, tion 14:458–472 Paabo S (2005) X-chromosome as a marker Li N, Stephens M (2003) Modeling linkage dis- for population history: Linkage disequilibri- equilibrium and identifying recombination um and haplotype study in Eurasian popula- hotspots using single-nucleotide polymor- tions. Eur J Hum Genet 13:452–462 phism data. Genetics 165:2213–2233 Lahermo P, Sajantila A, Sistonen P, Lukka M, Mahadevan M, Tsilfi dis C, Sabourin L, Shutler Aula P, Peltonen L, Savontaus ML (1996) G, Amemiya C, Jansen G, Neville C, Narang The genetic relationship between the Finns M, Barcelo J, O’Hoy K (1992) Myotonic dys- and the Finnish Saami (Lapps): Analysis of trophy mutation: An unstable CTG repeat in nuclear DNA and mtDNA. Am J Hum Genet the 3’ untranslated region of the gene. Science 58:1309–1322 255:1253–1255 Lahermo P, Savontaus ML, Sistonen P, Beres J, May CA, Shone AC, Kalaydjieva L, Sajantila A, de Knijff P, Aula P, Sajantila A (1999) Y chro- Jeffreys AJ (2002) Crossover clustering and mosomal polymorphisms reveal founding lin- rapid decay of linkage disequilibrium in the eages in the Finns and the Saami. Eur J Hum Xp/Yp pseudoautosomal gene SHOX. Nat Genet 7:447–458 Genet 31:272–275

REFERENCES 47

NNikun06.inddikun06.indd 4747 224.9.20084.9.2008 116:41:136:41:13 McVean G, Spencer CC, Chaix R (2005) Perspec- Nielsen R, Hellmann I, Hubisz M, Bustamante tives on human genetic variation from the Hap- C, Clark AG (2007) Recent and ongoing se- Map project. PLoS Genet 1:e54 lection in the human genome. Nat Rev Genet Meinila M, Finnila S, Majamaa K (2001) Evi- 8:857–868 dence for mtDNA admixture between the fi nns Nielsen R, Williamson S, Kim Y, Hubisz MJ, and the saami. Hum Hered 52:160–170 Clark AG, Bustamante C (2005) Genomic Metspalu M, Kivisild T, Metspalu E, Parik J, scans for selective sweeps using SNP data. Hudjashov G, Kaldma K, Serk P, Karmin M, Genome Res 15:1566–1575 Behar DM, Gilbert MT, Endicott P, Mastana Niu CY, Luo JY, Hao ZM (2004) Genetic poly- S, Papiha SS, Skorecki K, Torroni A, Villems morphism analysis of cytochrome P4502C19 R (2004) Most of the extant mtDNA bound- in Chinese Uigur and Han populations. Chin J aries in South and Southwest Asia were likely Dig Dis 5:76–80 shaped during the initial settlement of Eurasia Nordqvist B (2000) Coastal adaptations in the by anatomically modern humans. BMC Gen- mesolitic. A study of costal sites with organic et 5:26 remains from the Boreal and Atlantic periods Michalatos-Beloin S, Tishkoff SA, Bentley KL, in Western Sweden. GOTARC Series B 13 Kidd KK, Ruano G (1996) Molecular haplo- Norio R (2003a) Finnish disease heritage I: Char- typing of genetic markers 10 kb apart by al- acteristics, causes, background. Hum Genet lele-specifi c long-range PCR. Nucleic Acids 112:441–456 Res 24:4841–4843 Norio R (2003b) Finnish disease heritage II: Pop- Mir KU, Southern EM (2000) Sequence varia- ulation prehistory and genetic roots of Finns. tion in genes and genomic DNA: Methods Hum Genet 112:457–469 for large-scale analysis. Annu Rev Genomics Norio R (2003c) The Finnish disease heritage III: Hum Genet 1:329–360 The individual diseases. Hum Genet 112:470– Mueller JC, Lohmussaar E, Magi R, Remm M, 526 Bettecken T, Lichtner P, Biskup S, Illig T, Ohta T (2002) Near-neutrality in evolution of Pfeufer A, Luedemann J, Schreiber S, Pram- genes and gene regulation. Proc Natl Acad Sci staller P, Pichler I, Romeo G, Gaddi A, Testa U S A 99:16134–16137 A, Wichmann HE, Metspalu A, Meitinger T Ota T, Kimura M (1973) A model of mutation (2005) Linkage disequilibrium patterns and appropriate to estimate the number of electro- tagSNP transferability among European pop- phoretically detectable alleles in a fi nite popu- ulations. Am J Hum Genet 76:387–398 lation. Genet Res 22:201–204 Myers S, Bottolo L, Freeman C, McVean G, Don- Paabo S (1989) Ancient DNA: Extraction, char- nelly P (2005) A fi ne-scale map of recombina- acterization, molecular cloning, and enzymat- tion rates and hotspots across the human ge- ic amplifi cation. Proc Natl Acad Sci U S A nome. Science 310:321–324 86:1939–1943 Myles S, Bouzekri N, Haverfi eld E, Cherkaoui Palo JU, Hedman M, Ulmanen I, Lukka M, Sa- M, Dugoujon JM, Ward R (2005) Genetic evi- jantila A (2007) High degree of Y-chromosom- dence in support of a shared Eurasian-North al divergence within Finland – forensic aspects. African dairying origin. Hum Genet 117:34– Forensic Sci Int Genet 1:120–124 42 Patil N, Berno AJ, Hinds DA, Barrett WA, Doshi Nachman MW (2001) Single nucleotide poly- JM, Hacker CR, Kautzer CR, Lee DH, Marjor- morphisms and recombination rate in humans. ibanks C, McDonough DP, Nguyen BT, Norris Trends Genet 17:481–485 MC, Sheehan JB, Shen N, Stern D, Stokowski Nachman MW, Crowell SL (2000) Estimate of RP, Thomas DJ, Trulson MO, Vyas KR, Fraz- the mutation rate per nucleotide in humans. er KA, Fodor SP, Cox DR (2001) Blocks of Genetics 156:297–304 limited haplotype diversity revealed by high- Nevanlinna HR (1972) The Finnish population resolution scanning of human chromosome 21. structure. A genetic and genealogical study. Science 294:1719–1723 Hereditas 71:195–236 Pavlov P, Svendsen JI, Indrelid S (2001) Human Nevanlinna HR (1984) The roots of the Finns in presence in the European Arctic nearly 40,000 the light of genetic research into distinguish- years ago. Nature 413:64–67 ing genetic characteristics. Soc Sci Fenni- Peltonen L, Palotie A, Lange K (2000) Use of ca:157–174 population isolates for mapping complex traits. Nat Rev Genet 1:182–190

48 REFERENCES

NNikun06.inddikun06.indd 4848 224.9.20084.9.2008 116:41:136:41:13 Perheentupa J (1995) The Finnish disease heri- Richards M, Macaulay V, Hickey E, Vega E, tage: A personal look. Acta Paediatr 84:1094– Sykes B, Guida V, Rengo C et al. (2000) Trac- 1099 ing european founder lineages in the near east- Phillips MS, Lawrence R, Sachidanandam R, ern mtDNA pool. Am J Hum Genet 67:1251– Morris AP, Balding DJ, Donaldson MA, 1276 Studebaker JF et al. (2003) Chromosome- Romualdi C, Balding D, Nasidze IS, Risch G, Ro- wide distribution of haplotype blocks and the bichaux M, Sherry ST, Stoneking M, Batzer role of recombination hot spots. Nat Genet MA, Barbujani G (2002) Patterns of human 33:382–387 diversity, within and among continents, in- Pimenoff V, Korpisaari A (2004) A preliminary ferred from biallelic DNA polymorphisms. report on the genetic analysis of the osteologi- Genome Res 12:602–612 cal remains of Tiwanaku tombs in Tiraska and Rootsi S, Zhivotovsky LA, Baldovic M, Kayser Aymara chullpa C17 in kewaya. Report sub- M, Kutuev IA, Khusainova R, Bermisheva mitted to DINAAR (National Directorate of MA, Gubina M, Fedorova SA, Ilumae AM, Archaeology and Anthropology, Bolivia). Khusnutdinova EK, Voevoda MI, Osipova Pimenoff V, Sajantila A (2002) Genetic Tools in LP, Stoneking M, Lin AA, Ferak V, Parik J, Molecular Anthropology and Archaeology. Kivisild T, Underhill PA, Villems R (2007) The Roots of Peoples and Languages of the A counter-clockwise northern route of the Northern Eurasia IV Ed. K. Julku, Gummerus Y-chromosome haplogroup N from south- OY. Jyväskylä, Finland. east asia towards europe. Eur J Hum Genet Pritchard JK, Stephens M, Donnelly P (2000) In- 15:204–211 ference of population structure using multilo- Rosenberg NA, Pritchard JK, Weber JL, Cann cus genotype data. Genetics 155:945–959 HM, Kidd KK, Zhivotovsky LA, Feldman Pult I, Sajantila A, Simanainen J, Georgiev O, MW (2002) Genetic structure of human popu- Schaffner W, Paabo S (1994) Mitochondrial lations. Science 298:2381–2385 DNA sequences from switzerland reveal strik- Ross AB, Johansson A, Ingman M, Gyllensten U ing homogeneity of european populations. Biol (2006) Lifestyle, genetics, and disease in sami. Chem Hoppe Seyler 375:837–840 Croat Med J 47:553–565 Quintana-Murci L, Chaix R, Wells RS, Behar Rosser ZH, Zerjal T, Hurles ME, Adojaan M, Ala- DM, Sayar H, Scozzari R, Rengo C, Al-Zah- vantic D, Amorim A, Amos W et al. (2000) Y- ery N, Semino O, Santachiara-Benerecetti AS, chromosomal diversity in europe is clinal and Coppa A, Ayub Q, Mohyuddin A, Tyler-Smith infl uenced primarily by geography, rather C, Qasim Mehdi S, Torroni A, McElreavey K than by language. Am J Hum Genet 67:1526– (2004) Where west meets east: The complex 1543 mtDNA landscape of the southwest and cen- Ruano G, Kidd KK, Stephens JC (1990) Haplo- tral asian corridor. Am J Hum Genet 74:827– type of multiple polymorphisms resolved by 845 enzymatic amplifi cation of single DNA mol- Raitio M, Lindroos K, Laukkanen M, Pastinen ecules. Proc Natl Acad Sci U S A 87:6296– T, Sistonen P, Sajantila A, Syvanen AC (2001) 6300 Y-chromosomal SNPs in fi nno-ugric-speaking Sabeti PC, Schaffner SF, Fry B, Lohmueller J, populations analyzed by minisequencing on Varilly P, Shamovsky O, Palma A, Mikkelsen microarrays. Genome Res 11:471–482 TS, Altshuler D, Lander ES (2006) Positive Redon R, Ishikawa S, Fitch KR, Feuk L, Per- natural selection in the human lineage. Sci- ry GH, Andrews TD, Fiegler H et al. (2006) ence 312:1614–1620 Global variation in copy number in the human Sabeti PC, Varilly P, Fry B, Lohmueller J, genome. Nature 444:444–454 Hostetter E, Cotsapas C, Xie X et al. (2007) Reich DE, Cargill M, Bolk S, Ireland J, Sabe- Genome-wide detection and characterization ti PC, Richter DJ, Lavery T, Kouyoumjian of positive selection in human populations. R, Farhadian SF, Ward R, Lander ES (2001) Nature 449:913–918 Linkage disequilibrium in the human genome. Sahi T, Isokoski M, Jussila J, Launiala K, Pyorala Nature 411:199–204 K (1973) Recessive inheritance of adult-type Reich DE, Lander ES (2001) On the allelic spec- lactose malabsorption. Lancet 2:823–826 trum of human disease. Trends Genet 17:502– Saillard J, Forster P, Lynnerup N, Bandelt HJ, 510 Norby S (2000) mtDNA variation among greenland eskimos: The edge of the beringian expansion. Am J Hum Genet 67:718–726

REFERENCES 49

NNikun06.inddikun06.indd 4949 224.9.20084.9.2008 116:41:136:41:13 Sajantila A, Lahermo P, Anttinen T, Lukka M, Si- Syvanen AC (2001) Accessing genetic variation: stonen P, Savontaus ML, Aula P, Beckman L, Genotyping single nucleotide polymorphisms. Tranebjaerg L, Gedde-Dahl T, Issel-Tarver L, Nat Rev Genet 2:930–942 DiRienzo A, Paabo S (1995) Genes and lan- Tambets K, Rootsi S, Kivisild T, Help H, Serk P, guages in europe: An analysis of mitochondri- Loogvali EL, Tolk HV et al. (2004) The west- al lineages. Genome Res 5:42–52 ern and eastern roots of the saami--the story Sajantila A, Lukka M, Syvanen AC (1999) Exper- of genetic “outliers” told by mitochondrial imentally observed germline mutations at hu- DNA and Y chromosomes. Am J Hum Genet man micro- and minisatellite loci. Eur J Hum 74:661–682 Genet 7:263–266 Terwilliger JD, Hiekkalinna T (2006) An utter Sajantila A, Salem AH, Savolainen P, Bauer K, refutation of the “fundamental theorem of the Gierig C, Paabo S (1996) Paternal and mater- HapMap”. Eur J Hum Genet 14:426–437 nal DNA lineages reveal a bottleneck in the Terwilliger JD, Zollner S, Laan M, Paabo S founding of the fi nnish population. Proc Natl (1998) Mapping genes through the use of Acad Sci U S A 93:12035–12039 linkage disequilibrium generated by genet- Sanger F, Nicklen S, Coulson AR (1977) DNA ic drift: ‘drift mapping’ in small populations sequencing with chain-terminating inhibitors. with no demographic expansion. Hum Hered Proc Natl Acad Sci U S A 74:5463–5467 48:138–154 Semino O, Passarino G, Oefner PJ, Lin AA, Ar- Tishkoff SA, Dietzsch E, Speed W, Pakstis AJ, buzova S, Beckman LE, De Benedictis G, Kidd JR, Cheung K, Bonne-Tamir B, San- Francalacci P, Kouvatsi A, Limborska S, Mar- tachiara-Benerecetti AS, Moral P, Krings M cikiae M, Mika A, Mika B, Primorac D, San- (1996) Global patterns of linkage disequilib- tachiara-Benerecetti AS, Cavalli-Sforza LL, rium at the CD4 locus and modern human ori- Underhill PA (2000) The genetic legacy of pa- gins. Science 271:1380–1387 leolithic homo sapiens sapiens in extant euro- Tishkoff SA, Kidd KK (2004) Implications of peans: A Y chromosome perspective. Science biogeography of human populations for ‘race’ 290:1155–1159 and medicine. Nat Genet 36:S21–7 Service S, DeYoung J, Karayiorgou M, Roos JL, Tishkoff SA, Reed FA, Ranciaro A, Voight Pretorious H, Bedoya G, Ospina J et al. (2006) BF, Babbitt CC, Silverman JS, Powell K, Magnitude and distribution of linkage disequi- Mortensen HM, Hirbo JB, Osman M, Ibrahim librium in population isolates and implications M, Omar SA, Lema G, Nyambo TB, Ghori J, for genome-wide association studies. Nat Bumpstead S, Pritchard JK, Wray GA, Delou- Genet 38:556–560 kas P (2007) Convergent adaptation of human Slatkin M (2008) Linkage disequilibrium--under- lactase persistence in africa and europe. Nat standing the evolutionary past and mapping the Genet 39:31–40 medical future. Nat Rev Genet 9:477–485 Tishkoff SA, Verrelli BC (2003) Role of evolu- Slatkin M, Excoffi er L (1996) Testing for link- tionary history on haplotype block structure in age disequilibrium in genotypic data using the the human genome: Implications for disease expectation-maximization algorithm. Heredity mapping. Curr Opin Genet Dev 13:569–575 76:377–383 Torroni A, Achilli A, Macaulay V, Richards M, Smith JM, Haigh J (1974) The hitch-hiking effect Bandelt HJ (2006) Harvesting the fruit of the of a favourable gene. Genet Res 23:23–35 human mtDNA tree. Trends Genet 22:339– Stephens M, Smith NJ, Donnelly P (2001) A new 345 statistical method for haplotype reconstruc- Torroni A, Bandelt HJ, D’Urbano L, Lahermo P, tion from population data. Am J Hum Genet Moral P, Sellitto D, Rengo C, Forster P, Savon- 68:978–989 taus ML, Bonne-Tamir B, Scozzari R (1998) Stumpf MP, Goldstein DB (2003) Demography, mtDNA analysis reveals a major late paleo- recombination hotspot intensity, and the block lithic population expansion from southwest- structure of linkage disequilibrium. Curr Biol ern to northeastern europe. Am J Hum Genet 13:1–8 62:1137–1152 Swallow DM (2003) Genetics of lactase persis- Torroni A, Bandelt HJ, Macaulay V, Richards M, tence and lactose intolerance. Annu Rev Genet Cruciani F, Rengo C, Martinez-Cabrera V et 37:197–219 al. (2001) A signal, from human mtDNA, of postglacial recolonization in europe. Am J Hum Genet 69:844–852

50 REFERENCES

NNikun06.inddikun06.indd 5050 224.9.20084.9.2008 116:41:146:41:14 Underhill PA, Shen P, Lin AA, Jin L, Passarino G, Wang N, Akey JM, Zhang K, Chakraborty R, Jin Yang WH, Kauffman E, Bonne-Tamir B, Ber- L (2002) Distribution of recombination cross- tranpetit J, Francalacci P, Ibrahim M, Jenkins overs and the origin of haplotype blocks: The T, Kidd JR, Mehdi SQ, Seielstad MT, Wells interplay of population history, recombina- RS, Piazza A, Davis RW, Feldman MW, Caval- tion, and mutation. Am J Hum Genet 71:1227– li-Sforza LL, Oefner PJ (2000) Y chromosome 1234 sequence variation and the history of human Waples RS, Gaggiotti O (2006) What is a popula- populations. Nat Genet 26:358–361 tion? an empirical evaluation of some genetic Varilo T, Laan M, Hovatta I, Wiebe V, Terwilliger methods for identifying the number of gene JD, Peltonen L (2000) Linkage disequilibrium pools and their degree of connectivity. Mol in isolated populations: Finland and a young Ecol 15:1419–1439 sub-population of kuusamo. Eur J Hum Genet Watkins WS, Rogers AR, Ostler CT, Wooding S, 8:604–612 Bamshad MJ, Brassington AM, Carroll ML, Varilo T, Paunio T, Parker A, Perola M, Meyer J, Nguyen SV, Walker JA, Prasad BV, Reddy PG, Terwilliger JD, Peltonen L (2003) The inter- Das PK, Batzer MA, Jorde LB (2003) Genet- val of linkage disequilibrium (LD) detected ic variation among world populations: Infer- with microsatellite and SNP markers in chro- ences from 100 alu insertion polymorphisms. mosomes of fi nnish populations with different Genome Res 13:1607–1618 histories. Hum Mol Genet 12:51–59 Weber JL (1990) Human DNA polymorphisms Varilo T, Savukoski M, Norio R, Santavuori P, and methods of analysis. Curr Opin Biotech- Peltonen L, Jarvela I (1996) The age of hu- nol 1:166–171 man mutation: Genealogical and linkage dis- Wedlund PJ (2000) The CYP2C19 enzyme poly- equilibrium analysis of the CLN5 mutation morphism. Pharmacology 61:174–183 in the fi nnish population. Am J Hum Genet Weiss KM, Clark AG (2002) Linkage disequi- 58:506–512 librium and the mapping of complex human Vasil’ev SA, Kuzmin YV, Orlova LA, Dementiev traits. Trends Genet 18:19–24 VN (2002) Radiocarbon-based chronology of Wells RS, Yuldasheva N, Ruzibakiev R, Under- the paleolithic of siberia and its relevance to hill PA, Evseeva I, Blue-Smith J, Jin L et al. the peopling of the new world. Radiocarbon (2001) The eurasian heartland: A continental 44(2):503–530 perspective on Y-chromosome diversity. Proc Venter JC, Adams MD, Myers EW, Li PW, Mu- Natl Acad Sci U S A 98:10244–10249 ral RJ, Sutton GG, Smith HO et al. (2001) Wilkinson GR (2005) Drug metabolism and vari- The sequence of the human genome. Science ability among patients in drug response. N 291:1304–1351 Engl J Med 352:2211–2221 Vilkki J, Savontaus ML, Nikoskelainen EK Williamson SH, Hubisz MJ, Clark AG, Payseur (1988) Human mitochondrial DNA types in BA, Bustamante CD, Nielsen R (2007) Local- fi nland. Hum Genet 80:317–321 izing recent adaptive evolution in the human Vormfelde SV, Schirmer M, Toliat MR, Meineke genome. PLoS Genet 3:e90 I, Kirchheiner J, Nurnberg P, Brockmoller J Wright S (1931) Evolution in Mendelian popula- (2007) Genetic variation at the CYP2C locus tions. Genetics 16:97–159 and its association with torsemide biotransfor- Xiao ZS, Goldstein JA, Xie HG, Blaisdell J, mation. Pharmacogenomics J 7:200–211 Wang W, Jiang CH, Yan FX, He N, Huang SL, Wall JD, Pritchard JK (2003) Haplotype blocks Xu ZH, Zhou HH (1997) Differences in the in- and linkage disequilibrium in the human ge- cidence of the CYP2C19 polymorphism affect- nome. Nat Rev Genet 4:587–597 ing the S-mephenytoin phenotype in chinese Walton R, Kimber M, Rockett K, Trafford C, han and bai populations and identifi cation of Kwiatkowski D, Sirugo G (2005) Haplo- a new rare CYP2C19 mutant allele. J Pharma- type block structure of the cytochrome P450 col Exp Ther 281:604–609 CYP2C gene cluster on chromosome 10. Nat Xie HG, Stein CM, Kim RB, Wilkinson GR, Genet 37:915–6; author reply 916 Flockhart DA, Wood AJ (1999) Allelic, ge- Wang ET, Kodama G, Baldi P, Moyzis RK (2006) notypic and phenotypic distributions of S- Global landscape of recent inferred darwinian mephenytoin 4’-hydroxylase (CYP2C19) in selection for homo sapiens. Proc Natl Acad healthy caucasian populations of european de- Sci U S A 103:135–140 scent throughout the world. Pharmacogenetics 9:539–549

REFERENCES 51

NNikun06.inddikun06.indd 5151 224.9.20084.9.2008 116:41:146:41:14 Xu X, Peng M, Fang Z (2000) The direction of Zerjal T, Dashnyam B, Pandya A, Kayser M, microsatellite mutations is dependent upon al- Roewer L, Santos FR, Schiefenhovel W, lele length. Nat Genet 24:396–399 Fretwell N, Jobling MA, Harihara S, Shi- Y Chromosome Consortium (2002) A nomen- mizu K, Semjidmaa D, Sajantila A, Salo P, clature system for the tree of human Y-chro- Crawford MH, Ginter EK, Evgrafov OV, Ty- mosomal binary haplogroups. Genome Res ler-Smith C (1997) Genetic relationships of 12:339–348 asians and northern europeans, revealed by Y- Zeggini E, Barton A, Eyre S, Ward D, Ollier W, chromosomal DNA analysis. Am J Hum Genet Worthington J, John S (2005) Characterisation 60:1174–1183 of the genomic architecture of human chromo- Zerjal T, Wells RS, Yuldasheva N, Ruzibakiev some 17q and evaluation of different methods R, Tyler-Smith C (2002) A genetic landscape for haplotype block defi nition. BMC Genet reshaped by recent events: Y-chromosomal 6:21 insights into central asia. Am J Hum Genet Zerjal T, Beckman L, Beckman G, Mikelsaar AV, 71:466–482 Krumina A, Kucinskas V, Hurles ME, Tyler- Zhang W, Collins A, Maniatis N, Tapper W, Mor- Smith C (2001) Geographical, linguistic, and ton NE (2002) Properties of linkage disequi- cultural infl uences on genetic diversity: Y- librium (LD) maps. Proc Natl Acad Sci U S A chromosomal distribution in northern europe- 99:17004–17007 an populations. Mol Biol Evol 18:1077–1087 Zhivotovsky LA, Rosenberg NA, Feldman MW (2003) Features of evolution and expansion of modern humans, inferred from genomewide microsatellite markers. Am J Hum Genet 72:1171–1186

52 REFERENCES

NNikun06.inddikun06.indd 5252 224.9.20084.9.2008 116:41:146:41:14