Genetic Structure in Finland and Sweden: Aspects of Population History and Gene Mapping
Total Page:16
File Type:pdf, Size:1020Kb
GENETIC STRUCTURE IN FINLAND AND SWEDEN: ASPECTS OF POPULATION HISTORY AND GENE MAPPING Elina Salmela Department of Medical Genetics, Haartman Institute, Research Programs Unit, Molecular Medicine, and Institute for Molecular Medicine Finland FIMM University of Helsinki Academic dissertation To be presented, with the permission of the Faculty of Medicine of the University of Helsinki, for public examination in Auditorium XII, Main Building, Fabianinkatu 33, Helsinki, on October 5th 2012, at 12 noon Helsinki 2012 Supervisors Professor Juha Kere Department of Medical Genetics University of Helsinki Helsinki, Finland Folkhälsan Institute of Genetics Helsinki, Finland Department of Biosciences and Nutrition Karolinska Institutet Huddinge, Sweden Docent Päivi Lahermo Institute for Molecular Medicine Finland FIMM University of Helsinki Helsinki, Finland Reviewers Professor Jaakko Ignatius Department of Clinical Genetics Turku University Hospital Turku, Finland Professor Marjo-Riitta Järvelin School of Public Health Imperial College London London, UK Opponent Docent Mattias Jakobsson Department of Evolutionary Biology Uppsala University Uppsala, Sweden ISBN 978-952-10-8190-3 (paperback) ISBN 978-952-10-8191-0 (PDF) http://ethesis.helsinki.fi/ Layout: Jere Kasanen Cover: Ancestries by Admixture. Modified from Figure 14. Unigrafia, Helsinki 2012 To my family Houkat ja viisahat uurteet otsillaan että mihin ja miksi ja mistä, mikä johti? Aika ei paljasta arvoituksiaan, meidän mukanaan niitä vain käytävä on kohti. Pauli Hanhiniemi: Äärelä (2006) In Hehkumo: Muistoja tulevaisuudesta Fools and wise people wonder with furrowed brows: what, where does it lead to, why, and where from? Time does not tell us its mysteries – all we can do is wander with it to meet them. Translated by Sara Norja CONTENTS LIST OF ORIGINAL PUBLICATIONS 9 ABBREVIATIONS 10 ABSTRACT 11 TIIVISTELMÄ 14 INTRODUCTION 17 REVIEW OF THE LITERATURE 19 1. Basics of population genetics 20 1.1. Inheritance 20 1.2. Allele frequencies 20 1.2.1. Mutation 21 1.2.2. Genetic drift 21 1.2.3. Selection 22 1.2.4. Migration 22 1.3. Genotypes 23 1.4. Haplotypes, recombination, and linkage disequilibrium 24 2. Human genome structure 26 2.1. Types and patterns of variation 27 2.1.1. Single nucleotide polymorphisms (SNPs) 27 2.1.2. Microsatellites 28 2.1.3. Structural variation 28 2.2. Haplotype structure 29 3. Gene mapping strategies 30 3.1. Linkage analysis 30 3.1.1. Suitability for different populations and phenotypes 32 3.2. Association analysis 32 3.2.1. Suitability for different populations 34 3.2.2. Suitability for different phenotypes 35 3.3. Complementary methods 36 4.G enetic inference of population history 37 4.1. General principles 37 4.2. Genome-wide SNP datasets in studies of population history 39 4.3. The multidisciplinary inference of human past 41 5. Finland 43 5.1. Population prehistory and history 43 5.2. Language 47 5.3. Genetic structure 48 5.3.1. Finns compared to other European populations 48 5.3.2. Eastern influence in the Finnish gene pool 49 5.3.3. Finns and their Saami neighbors 50 5.3.4. Genetic structure within Finland 50 5.3.5. Phenotypic variation in Finland 51 5.3.6. East-west difference in cardiovascular diseases in Finland 52 5.3.7. Age estimates of Finnish genes 53 6. Sweden 55 6.1. Population prehistory and history 55 6.2. Language 58 6.3. Genetic structure 59 6.3.1. Swedes compared to other European populations 59 6.3.2. Genetic structure within Sweden 60 AIMS OF THE STUDY 63 MATERIALS AND METHODS 64 1. Samples 64 1.1. Finns (I, II, III, IV) 65 1.2. Swedes (II, III, IV) 66 1.3. Reference populations (III, IV) 66 1.4. Simulated population data (I) 68 2. Markers, genotyping, and quality control 69 3. Analyses 71 3.1. Model-free clustering of individuals: multidimensional scaling 71 3.2. Model-based clustering of individuals: Structure, Admixture, and Geneland 71 3.3. IBS distributions within and between populations 71 3.4. SNP-based inspection of eastern admixture 72 3.5. Distances between populations: FST and allele frequency differences 72 3.6. Hierarchical population subdivision, AMOVA, and inbreeding 73 3.7. Linkage disequilibrium 73 3.8. Correlation of genetic and geographic distances 73 3.9. Tests of the subpopulation difference scanning (SDS) approach 73 3.10. Enrichment analyses 74 RESULTS 75 1. Population structure in Finland and Sweden 75 1.1. Model-free clustering of individuals: multidimensional scaling 75 1.2. Model-based clustering of individuals: Structure, Admixture, and Geneland 77 1.3. IBS distributions between populations 80 1.4. SNP-based inspection of eastern admixture 81 1.5. Distances between populations: FST and allele frequency differences 82 1.6. Hierarchical population subdivision, AMOVA, and inbreeding 82 1.7. IBS distributions within populations 83 1.8. Linkage disequilibrium 83 1.9. Correlation of genetic and geographic distances 83 2. Subpopulation difference scanning (SDS) approach 84 2.1. Population simulations 84 2.2. East-west differences in the genome-wide SNP dataset 85 DISCUSSION 88 1. Technical aspects 88 1.1. Quality control of genome-wide SNP data for population genetics 88 1.2. Comparison of model-based methods of individual clustering 89 2. Population structure in Finland and Sweden 90 2.1. Comparison of Finland and Sweden 90 2.2. Finland 92 2.2.1. Eastern influence 92 2.2.2. East-west difference 93 2.2.3. Genetic vs. archaeological view of Finnish population history 94 2.3. Sweden 95 2.4. Limitations of the results 97 3. Subpopulation difference scanning (SDS) approach 100 3.1. The idea 100 3.2. Theoretical testing, advantages, and limitations 101 3.3. Analyses of genome-wide population data 103 CONCLUDING REMARKS AND FUTURE PROSPECTS 104 ACKNOWLEDGEMENTS 106 REFERENCES 110 LIST OF ORIGINAL PUBLICATIONS This thesis is based on the following original publications. They are referred to in the text by their Roman numerals. In addition, some unpublished data are presented. I Salmela E, Taskinen O, Seppänen JK, Sistonen P, Daly MJ, Lahermo P, Savontaus ML, Kere J. Subpopulation difference scanning: a strategy for exclusion mapping of susceptibility genes. J Med Genet. 2006; 43(7):590-7. II Hannelius U, Salmela E, Lappalainen T, Guillot G, Lindgren CM, von Döbeln U, Lahermo P, Kere J. Population substructure in Finland and Sweden revealed by the use of spatial coordinates and a small number of unlinked autosomal SNPs. BMC Genet. 2008; 9:54. III Salmela E*, Lappalainen T*, Fransson I, Andersen PM, Dahlman- Wright K, Fiebig A, Sistonen P, Savontaus ML, Schreiber S, Kere J, Lahermo P. Genome-wide analysis of single nucleotide poly- morphisms uncovers population structure in Northern Europe. PLoS One. 2008; 3(10):e3519. IV Salmela E, Lappalainen T, Liu J, Sistonen P, Andersen PM, Schreiber S, Savontaus ML, Czene K, Lahermo P, Hall P, Kere J. Swedish population substructure revealed by genome-wide single nucleotide polymorphism data. PLoS One. 2011; 6(2):e16747. * Equal contribution. Publications II and III have earlier appeared as part of the doctoral theses of Ulf Hannelius (Stockholm 2008) and Tuuli Lappalainen (Helsinki 2009), respectively. These publications have been reproduced with permission from their copyright holders. The open access publications II-IV are freely available online. LIST OF ORIGINAL PUBLICATIONS 9 ABBREVIATIONS AD Anno Domini aDNA ancient DNA AIM ancestry-informative marker AMI acute myocardial infarction AMOVA analysis of molecular variance BC before Christ BRCA1 breast cancer 1, early onset gene BRCA2 breast cancer 2, early onset gene CEU Utah residents with ancestry from northern and western Europe CHB Han Chinese in Beijing, China CI confidence interval cM centimorgan CVD cardiovascular disease CYP21 steroid 21-hydroxylase gene DNA deoxyribonucleic acid DIP deletion/insertion polymorphism FDH Finnish disease heritage GWAS genome-wide association study HGDP Human Genome Diversity Project HLA human leukocyte antigen HWE Hardy-Weinberg equilibrium IBS identity by state JPT Japanese in Tokyo, Japan kb kilobase KEGG Kyoto Encyclopedia of Genes and Genomes LD linkage disequilibrium LOD logarithm of odds MAF minor allele frequency Mb megabase MCAD medium-chain acyl-CoA dehydrogenase MDS multidimensional scaling mtDNA mitochondrial DNA NHGRI National Human Genome Research Institute OR odds ratio PAR pseudoautosomal region PCA principal component analysis QC quality control RNA ribonucleic acid SDS subpopulation difference scanning SNP single nucleotide polymorphism STR short tandem repeat YRI Yoruba in Ibadan, Nigeria 10 ABBREVIATIONS ABSTRACT The genetic structure of populations is of interest not only as a source of population history information, but also because of its importance to gene mapping studies. The main aim of this thesis was to study the genetic structure of the human populations in present-day Finland and Sweden. Although both populations have been studied with small numbers of genetic markers, the present analyses were the first to utilize genome-wide data from thousands of single nucleotide polymorphism (SNP) markers. Furthermore, this thesis introduced a novel gene mapping approach, subpopulation differ- ence scanning (SDS), and tested its theoretical applicability to the Finnish population. The study subjects included 280 Finnish and 1525 Swedish individuals. Of the Finns, 141 had grandparents born in western and 139 in eastern Finland. For the Swedes, the geographic information was based on their places of residence, scattered throughout the country roughly according to population density. The Finns were genotyped for 238,000 SNPs on the Affymetrix 250K StyI array and the Swedes for 550,000 SNPs on the Illumina HumanHap550 array. Most analyses were based on subsets of the 29,000 SNPs common to both platforms, and none used imputed genotypes. Genotypes from Russian, German, British and other populations served as reference data. The amount and patterns of genetic variation between populations and individuals were analyzed by standard population genetic methods using, for example, allele frequencies, identity-by-state (IBS) similarities, FST distances, and linkage disequilibrium (LD).