Genetics: Early Online, published on August 17, 2016 as 10.1534/genetics.116.189241

Nationwide Genomic Study in Reveals Remarkable

Population Homogeneity

Georgios Athanasiadis,*,†,1 Jade Y. Cheng,* Bjarni J. Vilhjálmsson,*,‡ Frank G.

Jørgensen,§ Thomas D. ,**,††,‡‡ Stephanie Le Hellard,§§,*** Thomas Espeseth,†††,‡‡‡

Patrick F. Sullivan,§§§,****,†††† Christina M. Hultman,§§§ Peter C. Kjærgaard,†,‡‡‡‡,2

Mikkel H. Schierup*,†,§§§§ and Thomas Mailund,*,†

*Bioinformatics Research Centre, University, Aarhus, 8000, Denmark

†Centre for Biocultural History, Aarhus University, Aarhus, 8000, Denmark

‡Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston,

MA 02115, USA

§Tørring Gymnasium, Tørring, 7160, Denmark

**Department of Biomedicine, Aarhus University, Aarhus, 8000, Denmark

††The Initiative for Integrative Psychiatric Research (iPSYCH), Aarhus University,

Aarhus, 8000, Denmark

‡‡Center for Integrative Sequencing, iSEQ, Aarhus University, Aarhus, 8000,

Denmark

§§Dr E. Martens Research Group of Biological Psychiatry, Center for Medical

Genetics and Molecular Medicine, Haukeland University Hospital, Bergen, 5021,

Norway

1

Copyright 2016. ***NORMENT, KG Jebsen Centre for Psychosis Research, Department of Clinical

Science, University of Bergen, Bergen, 5021, Norway

†††NORMENT, KG Jebsen Centre for Psychosis Research, Department of

Psychology, University of Oslo, Oslo, 0424, Norway

‡‡‡NORMENT, KG Jebsen Centre for Psychosis Research, Division of Mental Health and Addiction, Oslo University Hospital, Oslo, 0424, Norway

§§§Department of Medical Epidemiology and Biostatistics, Karolinska Institutet,

Stockholm, 17177,

****Department of Genetics, University of North Carolina, Chapel Hill, NC 27599-

7264, USA

††††Department of Psychiatry, University of North Carolina, Chapel Hill, NC 27599-

7264, USA

‡‡‡‡Department of Culture and Society, Aarhus University, Aarhus, 8000, Denmark

§§§§Department of Bioscience, Aarhus University, Aarhus, 8000, Denmark

1Corresponding author: Bioinformatics Research Centre, Aarhus University, Aarhus,

8000, Denmark. E-mail: [email protected].

2Present address: The Natural History Museum of Denmark, University of

Copenhagen, , 1471, Denmark.

2 Running title

The population structure of Denmark

Key words

Human population genetics, population structure, admixture, polygenic risk score, genetic prediction

Corresponding author

Name: Dr. Georgios Athanasiadis

Mailing address: Bioinformatics Research Centre, C.F. Møllers Allé, Building 1110,

Aarhus University, Aarhus, 8000, Denmark

Phone number: (+45) 87 15 55 69

E-mail: [email protected]

3 Abstract

Denmark has played a substantial role in the history of Northern . Through a nationwide scientific outreach initiative, we collected genetic and anthropometrical data from ~800 high school students and used them to elucidate the genetic makeup of the Danish population, as well as to assess polygenic predictions of phenotypic traits in adolescents. We observed remarkable homogeneity across different geographic , although we could still detect weak signals of genetic structure reflecting the history of the country. Denmark presented genomic affinity with primarily neighboring countries with overall resemblance of decreasing weight from

Britain, Sweden, Norway, Germany and France. A Polish admixture signal was detected in and and our date estimates coincided with historical evidence of Wend settlements in the south of Denmark. We also observed considerably diverse demographic histories among Scandinavian countries, with

Denmark having the smallest current effective population size compared to Norway and Sweden. Finally, we found that polygenic prediction of self-reported adolescent height in the population was remarkably accurate (R2 = 0.639±0.015). The high homogeneity of the Danish population could render population structure a lesser concern for the upcoming large-scale gene-mapping studies in the country.

4 Introduction

Denmark has played a substantial role in the history of Northern Europe. Like Swedes and Norwegians, are historically linked to the Vikings – the Germanic Norse seafarers whose commercial and military operations marked the in

European history (793-1066). Through a series of invasions, conquests and alliances, the Danish Vikings established settlements as far as in Britain, Estonia, the Faroe

Islands, Iceland, and even Canada. Denmark’s long-lasting historical bond with Sweden and Norway is nicely exemplified in the formation of the , a state that brought together the three Scandinavian nations from 1397 to 1523 as a response to Germany’s expansionist tendency to the North (Derry 2000).

Denmark’s historical complexity is contrasted by the country’s small current size – both in terms of geographic area (~43,000 km2) and population (5,614 million inhabitants) – a fact even further contrasted by the notable compartmentalization of the Danish linguistic map in about 30 dialectal areas (http://dialekt.ku.dk/). These observations, together with the evident lack of strong geographic barriers in the country, make present-day Denmark an interesting setting for studying the genetic history of small populations with a glorious past.

In recent years, there has been an explosion of human genetic studies that contributed substantially to the characterization of worldwide variation patterns (Li et al. 2008;

Lao et al. 2008; Novembre et al. 2008; Reich et al. 2009; Busby et al. 2015); the reconstruction of population history in regions with poor/nonexistent historical records (Moreno-Estrada et al. 2014; Moreno-Mayar et al. 2014); the study of local and global patterns of admixture in multiethnic societies (Bryc et al. 2015); and the study of admixture with other hominin species as well as its use in elucidating human

5 dispersals (Green et al. 2010; Reich et al. 2011). The increased power of high- throughput genotype data and computational methods has also boosted the emergence of many single-country genomic projects in Europe (Jakkula et al. 2008; Price et al.

2009; Genome of the Netherlands Consortium 2014; Leslie et al. 2015; Karakachoff et al. 2015) and the release of useful data to the public (Welter et al. 2014).

In this work, we extend the collection of single-country genetic studies by introducing a new project from Denmark. Unlike previous genomic projects involving Denmark

(Bae et al. 2013; Larsen et al. 2013), ours was conceived from the beginning as a scientific outreach initiative with benefits for both the general public and our research objectives. We invited high school students from across Denmark to participate in activities whose primary goal was to promote genomic literacy in secondary education (Athanasiadis et al. 2016). Most participants donated a DNA sample, which we used to explore the extent to which recent and more distant historical events left their mark on the genetic makeup of the Danish population. Apart from the population structure and demographic history in Denmark, an additional focus of this work has been the quantitative study of basic anthropometrical traits.

With this work, we ultimately report back to our DNA donors the biological and historical insights we gained from analyzing their genetic data. Our results showed remarkable homogeneity across different geographic regions in Denmark, though we were still able to detect weak signals of genetic structure. Denmark received substantial admixture contributions primarily from neighboring countries in the past ten centuries with overall influence of decreasing weight from Britain, Sweden and

Norway. We also observed considerably diverse demographic histories among the

Scandinavian countries, as reflected in their historical effective population size.

Finally, we found that human height can be predicted with remarkable accuracy in

6 Danish adolescents. Overall, this work showcases how far we can go with the analysis of genetic data from small, fairly homogeneous populations.

Materials and Methods

Sample description

New data:- We recruited ~800 students from 36 high schools from across Denmark under the Where Are You From? project (Athanasiadis et al. 2016). These numbers correspond to the sampling of approximately one in 9,350 Danish inhabitants (Figure

1). We asked each participant to provide a saliva sample for DNA analysis and to answer an online questionnaire about family origin (their own, their parents’ and their grandparents’ place of birth) and basic anthropometrical data (e.g. self-reported height and weight). The institutional review board of Aarhus University approved the study.

Because no health-related questions were asked, there was no requirement for additional approval by the University’s medical ethics committee. Informed consent was obtained either from the participants (age > 18 years) or their parents (age < 18 years). We used the 23andMe (Mountain View, CA, USA) DNA analysis service for the genotyping of 723 participants. 23andMe uses a custom HumanOmniExpress-24

BeadChip from Illumina (San Diego, CA, USA). Genetic and phenotypic data have been deposited at the European Genome-Phenome Archive (EGA; https://ega- archive.org) under accession number EGAS00001001868. After excluding duplicate single nucleotide polymorphisms (SNPs), applying a per-locus missingness threshold of 2% with PLINK v1.9 (Chang et al. 2015) and removing SNPs ambiguously mapped to the forward DNA strand, 517,403 unique autosomal SNPs were available for further analysis.

7 Capital (14) Zealand (36) Funen (13) South (37) Central Jutland (89) North Jutland (48) Total (237)

Figure 1. Location of the 36 Danish high schools participating in the Where Are You From? project colored by geographic origin. Number of samples with at least three grandparents born in each of the six regions is shown in brackets.

Additional data:- To put our study in a broader European context, we included four additional datasets: (i) 2,833 individuals × 227,899 SNPs from the POpulation

REference Sample (POPRES) (Nelson et al. 2008); (ii) 494 individuals × 314,526

SNPs from Amyotrophic Lateral Sclerosis (ALS) Finland (Laaksovirta et al. 2010);

(iii) 381 individuals × 577,252 SNPs from the Swedish Schizophrenia Study (Ripke et al. 2013); and (iv) 300 individuals × 537,306 SNPs from the Norwegian Cognitive

NeuroGenetics (NCNG) sample (Espeseth et al. 2012). We applied separate quality controls to these datasets according to their particularities (protocols available upon request).

Imputation

We carried out genotype imputation within each dataset separately before combining them. We first changed DNA strand orientation of several SNPs in all five datasets to create a uniform forward orientation using SNPFLIP

8 (https://github.com/endrebak/snp-flip) and PLINK. Table S1 shows the number of

SNPs that were flipped/removed from each dataset. We then used liftOver from the

UCSC Genome Browser to “lift” genome coordinates from the NCBI36/hg18 (March

2006) to the GRCh37/hg19 (February 2009) assembly in the POPRES, ALS Finland and NCNG datasets. We produced “prephased” haplotypes for each dataset with

SHAPEIT v2.720 (Delaneau et al. 2012) and used the haplotypes together with the latest 1000 Genomes Phase 3 reference panel (b37, October 2014) for the separate imputation of the five datasets with IMPUTE2 v.2.3.1 (Howie et al. 2012). Finally, we concatenated the imputed data into separate chromosomes and filtered them for

“info” ≥ 0.975 with QCTOOL (http://www.well.ox.ac.uk/~gav/qctool/#overview).

Principal component analysis

We ran principal component analysis (PCA) with PLINK to explore population structure in our Danish sample. PCA was run on two different data combinations: (i)

POPRES, Where Are You From?, NCNG and the Swedish Schizophrenia Study; and

(ii) Where Are You From?, NCNG, the Swedish Schizophrenia Study and Germany from POPRES. To avoid undesirable clustering due to extensive linkage disequilibrium (LD), we thinned the genotypes with PLINK (using a window and step size of 1,500 and 150 SNPs, respectively, and r2 threshold = 0.80) and removed SNPs from known high-LD genomic regions (e.g. MHC on chromosome 6).

Chromosome painting, population clustering and admixture

We used a set of LD-based methods – CHROMOPAINTER (Lawson et al. 2012), fineSTRUCTURE (Lawson et al. 2012) and GLOBETROTTER (Hellenthal et al.

2014) – to explore fine-grain population structure and admixture in Denmark. These methods require a set of phased SNP data from “donor” (i.e. the available European

9 samples) and “recipient” populations (i.e. the Danish sample). In brief, the methods detect extended multi-marker haplotypes across the genome, which are organized in pairwise vectors of similarity counts (CHROMOPAINTER). These vectors are then used by an MCMC algorithm (fineSTRUCTURE) to hierarchically cluster individuals into groups that are often geographically, linguistically and/or historically meaningful

(Leslie et al. 2015). Admixture proportions are estimated through multiple linear regression on the average proportion of DNA that each recipient copies from each of the donor groups (GLOBETROTTER). Nonparametric standard errors of admixture proportions were calculated with a jackknife procedure described elsewhere

(Montinaro et al. 2015). Finally, GLOBETROTTER also measures the decay of association vs. genetic distance between the “chromosome chunks” copied from a given pair of donor groups. This decay is exponentially distributed with rate equal to the time that the admixture occurred (Hellenthal et al. 2014; Busby et al. 2015).

After jointly phasing 489,209 imputed autosomal SNPs across all five datasets with

IMPUTE2, we ran CHROMOPAINTER using default options (Leslie et al. 2015) on three different datasets: (i) Denmark alone; (ii) Europe without Denmark; and (iii)

Europe and Denmark together. We previously ran each analysis ten times on a sample subset (~10% of the total number samples and only for chromosomes 4, 10, 15, and

22) to estimate the switch and global emission rates used by CHROMOPAINTER’s

Hidden Markov Model. Once similarity vectors were defined in the three datasets, we used fineSTRUCTURE to explore clustering in recipient (dataset i) and donor (dataset ii) populations. For the GLOBETROTTER analysis, we merged the similarity matrices from datasets ii and iii and ran the program with default options described elsewhere (Leslie et al. 2015; Busby et al. 2015).

Ancestry component analysis

10 We used Ohana (https://github.com/jade-cheng/ohana) to estimate individual admixture proportions in 13 European countries – mostly Western and Northern – including Denmark. In brief, Ohana is a new model-based method that uses the same likelihood admixture model as does STRUCTURE (Pritchard et al. 2000), FRAPPE

(Tang et al. 2005) and ADMIXTURE (Alexander et al. 2009), and Newton’s method for optimization. Compared to established methods, Ohana achieves better likelihood estimates (R. Nielsen, personal communication). After running the algorithm, we reported per-country admixture proportions by averaging individual proportions within each country.

Relatedness and identity by descent

To explore relatedness in our Danish sample, we used two different methods: KING

(Manichaikul et al. 2010) for a formal calculation of pairwise kinship coefficients and

BEAGLE Refined IBD (Browning and Browning 2013a) for the inference of DNA segments that were identical by descent (IBD) between pairs of individuals. We analyzed 406 individuals for whom we had complete information that all four of their grandparents were born in Denmark. We excluded from the analysis one individual from pairs of twins and siblings (preferentially the one with highest per-locus genotype missingness).

Historical effective population size

We also estimated the historical effective population size (Ne) of the Danish, Swedish and Norwegian populations through the combination of two IBD-based methods. We first applied IBDseq (Browning and Browning 2013b) to the three Scandinavian datasets separately to produce sets of pairwise IBD tracts. We then used IBDNe

(Browning and Browning 2015) to estimate Ne from the distribution of the inferred

11 IBD tracts over the past 150 generations for each of the three Scandinavian populations. To maximize power, we used the original SNP data for this analysis.

Note that, when SNP data are used, IBDNe estimates are most reliable within the past

50 generations.

Polygenic prediction of height and BMI

We used self-reported height and weight from ~600 students of diverse ethnic backgrounds to perform polygenic risk prediction of height and body mass index

(BMI) with LDpred (Vilhjálmsson et al. 2015), a summary statistic-based algorithm that models LD to improve the prediction. As training data for the model, we used public summary statistics from large genome-wide association studies of adult height

(Wood et al. 2014) and BMI (Locke et al. 2015). We first removed from our data

SNPs with minor allele frequency (MAF) < 0.01, as well as SNPs with a MAF different from the one reported in the training data by a factor of 0.15. We then assessed SNP effect under different fractions of causal variants: p = {1, 0.5, 0.2, 0.1,

0.05}, whereby p = 1 corresponds to the infinitesimal model (Visscher et al. 2008).

As an LD reference, we used a subset of 407 unrelated individuals (kinship coefficient < 0.05) who had all four of their grandparents born in Denmark. Finally, we validated LDpred’s prediction of height in 578 individuals, as well as the prediction of BMI in 572 individuals. We calculated R2 – the proportion of phenotypic variance explained by the predictor – by (i) adjusting for age, sex and the first ten PCs, and (ii) actually including age, sex and the first ten PCs in the model.

PC adjustment ensures that genetic predictions are not influenced by ancestry.

Results

Principal component analysis

12 To put Danes in a European genetic context, we first ran PCA on 3,858 samples from across Europe. In particular, we extended a previously published PCA within Europe

(Novembre et al. 2008) to include sizeable samples from Denmark, Norway and

Sweden, as these were underrepresented in the original study. In our PCA, Denmark was represented by 407 students who had all four of their grandparents born in the country. Our Danish samples clustered in a geographically meaningful manner along the first two principal axes, partially overlapping with Norwegians and Swedes, and showing close genetic proximity to samples from Great Britain, the Netherlands,

Germany and (Figure 2A).

PC2 −0.04 −0.02 0.00 0.02 0.04 0.06 PC2 -0.10 -0.05 0.00 0.05 0.10 A 0.04 B 0.10 32°

0.02 SE NO ●SE●NO DK ●DK

PL IE ●PL NJ NL ●IE●NL GB ●GB 0.05 DE ●DE BE BE 0.00 CZ ●●CZ AT ●AT HU ●HU CH HR ●HR●CH NO FR ●FR SE CJ PC1 RO ●RO PC1 ES YU ●YU●ES −0.02 PT ●PT 0.00 Ca

ZeCa SJ GR ●GR FuNJCJ Ze IT ●IT Fu SJ

−0.04 DE ●IE -0.05 ●ES ●PT ●GB ●FR C −0.06 ●BE 0.3 ●CH ●NL PC1 ●DK PC2 ●DE ●NO 0.0 ●SE ●AT ●IT ●PL coefficientcorr. -0.3 ●HU ●CZ ●HR 0 30 90 135 180 210 270 315 360 ●RO ●YU ●GR °Rotation

Figure 2. (A) PCA of 105,672 imputed SNPs after merging four datasets: Where Are You From?, POPRES,

NCNG and the Swedish Schizophrenia study without outliers clustering with Finland (total N = 3,858). Per- country box plots (median and interquartile range) of PC values were added to facilitate interpretation. Whiskers represent data within 1.5 times the interquartile range. IE: Ireland; ES: Spain; PT: Portugal; GB: Great Britain; FR:

France; BE: Belgium; CH: Switzerland; NL: the Netherlands; DK: Denmark; DE: Germany; NO: Norway; SE:

Sweden; AT: Austria; IT: Italy; PL: Poland; HU: Hungary; CZ: Czech Republic; HR: Croatia; RO: Romania; YU:

Yugoslavia; GR: Greece. (B) PCA of 105,672 imputed SNPs from Denmark, Sweden, Norway and Germany with emphasis on the six geographic regions of Denmark (total N = 1,168). No clear genetic-geographic relationship was observed. Ca: Capital; Ze: Zealand; Fu: Funen; SJ: South Jutland; CJ: Central Jutland; NJ: North Jutland. (C)

Correlation of PC1 and PC2 with average grandparent place-of-birth latitude along a 360° clockwise rotation.

Maximum correlation was observed for PC1 at 32° (r ≈ 0.24; p < 10-5).

13 We then looked for fine-scale genetic structure within Denmark by focusing the analysis on 237 students who had at least three of their grandparents born in just one of the following six regions: Capital Region; Zealand; Funen; South, Central and

North Jutland (Figure 2B). These six groups correspond to the five administrative regions of Denmark (we further split the administrative region of South Denmark into

South Jutland and Funen). After rerunning PCA for Denmark, Sweden, Norway and

Germany alone (N = 1,168), we observed no geographically meaningful clustering of the 237 Danish samples (Figure 2B). This lack of strong genetic structure was also supported by the low average FST value between the six regions (FST = 0.0002). To check whether structure was simply too weak to be visually detected, we calculated for each Danish sample the average geographic coordinates of their grandparents’ place of birth and regressed the resulting values on PC1 and PC2 eigenvectors. We repeated the procedure by gradually rotating the map clockwise and found a weak yet significant correlation between PC1 and latitude along a northwest-southeast axis at

32° (r ≈ 0.24; p < 10-5; Figure 2B-C).

Chromosome painting, population clustering and admixture

To further explore historical genetic interactions between Denmark and neighboring countries, we ran CHROMOPAINTER, fineSTRUCTURE and GLOBETROTTER on a subset of the studied European populations. We used 2,745 individuals from 13

European countries (Norway, Sweden, Finland, the Netherlands, Germany, Poland,

Austria, Hungary, France, Belgium, Great Britain, Spain and Portugal) and the 237 individuals from the six geographic regions in Denmark (Figure 1) to quantify and date European admixture in each of the six Danish groups. We used these six groups for our analysis because we were unable to observe an alternative clustering:

14 fineSTRUCTURE clustered all Danish samples in a single large group (data not shown), reflecting again the weak genetic structure in the Danish population.

After running CHROMOPAINTER and fineSTRUCTURE on the European donor samples, these were organized in the eight major clusters seen in Figure 3A:

Norwegian (NOR), Swedish (SWE), Finnish (FIN), British (BRI), French (FRA),

German (GER), Polish (POL) and Iberian (IBE). There was always one predominant country in the makeup of each of those clusters, with the exception of IBE, in which

Spain and Portugal were present at almost equal proportions, and GER, in which samples from Austria, Hungary and the Netherlands were also present at large numbers. For convenience, we named the eight clusters after the predominant country/region in each of them. We treated these clusters as ancestry components

(“surrogate populations” in GLOBETROTTER jargon) and used GLOBETROTTER to define their contribution to each of the six Danish regions and to date the corresponding admixture events.

Figure 3. (A) fineSTRUCTURE grouping of the 2,745 European donor samples into eight clusters roughly corresponding to well-defined geographic locations. The tree on the left illustrates cluster topology. Pie charts on

15 the European map show the relative contribution from each country to each of the eight inferred clusters. FIN:

Finnish; NOR: Norwegian; SWE: Swedish; POL: Polish; GER: German; BRI: British; FRA: French; IBE: Iberian.

Color legend at the bottom of the map shows different donor countries. FI: Finland. (B) GLOBETROTTER admixture proportions of each of the eight European clusters in the six geographic regions of Denmark. Neither

FIN nor IBE made substantial contributions (< 2.5%) to the mixture profiles of Denmark.

Figure 3B and Table S2 show that the Swedish, Norwegian and British clusters made the most substantial contribution to the ancestry profiles of all six Danish groups, jointly accounting for 82.05-97.12% of the total admixture. The Scandinavian component (Swedish and Norwegian clusters together) surprisingly accounted for less than half of the total admixture (range: 41.69-48.96%), on a par with the British component (40.37-48.16%), which peaked in South Jutland. Interestingly, the contribution of the Swedish cluster alone (27.45-30.20%) was almost twice as large as that of the Norwegian cluster (14.24-18.76%). This difference could be explained by the reduced landscape connectivity between Norway and Denmark, affecting gene flow patterns. It is also striking that the German cluster had little genetic influence on

Denmark (3.04-6.78%), despite the proximity and historically fluid borders between the two countries. However, the latter observation could be due to the marginally higher genetic affinity of Denmark with Britain, which could result in prioritizing

British over German samples in the initial chromosome-painting step of our analysis.

Similarly, the French component was present at small yet considerable portions (4.16-

5.42%) in all Danish regions except for Funen and South Jutland. Finally, it is worth noting that there was a small Polish contribution to Zealand (6.28%) and Funen

(5.03%).

We further used GLOBETROTTER in a preliminary bootstrap re-sampling step within each of the six Danish regions to produce admixture dates. A proportion of nonsensical dates (i.e. ≤ 1 generation or ≥ 400 generations) above the empirical p-

16 value threshold (0.05) suggests that GLOBETROTTER cannot date reliably the corresponding admixture event. Following this procedure, we found that we did not have enough power to date admixture in the Capital region or in Funen (data not shown). For the remaining four regions, we used GLOBETROTTER’s pairwise co- ancestry curves (Figures S1-4) to define in more detail the type (i.e. one date vs. one date multi-way vs. two dates) and actual date of admixture, as well as a tentative composition of the admixing source populations. In all regions, GLOBETROTTER’s best fitting admixture model (R2 = 0.136-0.354) involved one admixture event with more than two sources admixing simultaneously (Table 1). Assuming 30 years per generation, the date of admixture ranged from ~434 to ~943 years before present.

Notably, the algorithm identified two clusters as the most recurring admixing sources: one showing highest affinity with GER and one with POL.

Ancestry component analysis

We also used Ohana to provide an independent estimate of individual admixture proportions in the same set of 12 European countries (without Finland) and six Danish regions. For this purpose, we assumed that each of the European samples was the result of admixture between K ancestral populations (i.e. ancestry components). Even though we ran the analysis for K = {2, …, 9} (Figure S5), we report the results for K

= 4, because this value corresponds to four distinct geographic origins of our samples: the Iberian Peninsula; Eastern, Central and Nordic Europe. The analysis returned an admixture pattern that was consistent with geography (i.e. North-South and East-West clines; Table 2; Figure 4). In particular, we observed (i) a Southern European component (blue), which was predominant in the Iberian peninsula but was also found at large proportions in France; (ii) an Eastern European component (yellow), which was predominant in Poland but was also found at large proportions in neighboring

17 countries (and notably in South and East Denmark, Sweden, Norway and Great

Britain); (iii) a Nordic component (green), which was predominant in Scandinavia

and to less extent in Germany; and (iv) a Central European component (red), at higher

proportions in Scandinavia and the Netherlands but also present in most European

countries (except for Iberia). Within Denmark, we observed that Zealand and South

Denmark had larger membership to the Eastern European component (18.90-20.36%)

compared to Central and North Jutland (~13%), resembling GLOBETROTTER’s

signal of Polish admixture observed in Zealand and Funen (Figure 3B; Table S2). The

PL Ca Ze Southern and Eastern European components were robust to different choices of K,

whereas the Central and Nordic European components were gradually divided into Fu SJ CJ

PL Ca Ze PL Ca Ze smaller fractionsPL asCa K went upZe , showing no clear geographic patterns (Figure S5).

PL Ca Ze

PLNJ NOCa SEZe Figure 4. Ancestral component analysis of 13 Fu SJ CJ Fu SJ CJ Fu SJ CJ

PL Ca Ze European countries (including six well- PT ES FR Fu SJ CJ

Fu SJ CJ PL Ca Ze defined geographic regions in Denmark

NJ PT NO ES SE FR PT ES PT FR ES FR NJ NO SE NJ NO SE PT ES FR shown in the inset) assuming K = 4 ancestral Fu SJ CJ BE UK NL NJ NO SE PT ES FR Fu SJ NJ CJ NO SE populations. Bar plot at the bottom shows

BE UK NL BE UK BE NL UK NL per-individual membership to each of the four BE UK NL PL Ca Ze NJ NO SE DE AT HU BE UK NL ancestral components, whereas pie charts on NJ NO SE

DE AT HU PT DE ES AT DEFR HU AT HU the map resume per-country (or per-region DE AT HU Fu SJ CJ

DE AT HU for Denmark) admixture proportions. Based

BE UK NL on their preponderance in different parts of PT ES FR PT ES FR

NJ NO SE Europe, we interpret the four components as

(i) Southern European (blue); (ii) Eastern DE AT HU BE UK NL BE UK NL European (yellow); (iii) Nordic (green); and

(iv) Central European (red). Regions from

DE AT HU DE AT HU 1PT 16 33 FR 50 67 121GB 143 DE 105 89 HU 168 190Ca252 Ze274 Fu 296 407SJ 303 325 CJ 347 369 428 NJ 450 472 197SE South and East Denmark show higher ES BE NL AT PL DK NO proportion of Eastern European ancestry in

accord with Figure 3B and Table S2.

18 Relatedness and identity by descent

We followed up the evidence for weak genetic structure in Denmark by exploring the degree and geographic distribution of relatedness among our Danish samples. Using

KING, we found that the vast majority of the Danish students were distantly related

(4th degree or more distant), with only four pairs of individuals showing 2nd or 3rd degree relationships (apart from the twins and siblings initially removed; Figure S6).

Because of the inherent uncertainty in distinguishing between different degrees of distantly related individuals, we did not attempt to further stratify the samples by kinship coefficient.

After establishing that most participants in our sample were either unrelated or distantly related (Figure S6), we examined DNA segments that were IBD between pairs of individuals with BEAGLE Refined IBD. Using total genomic length of IBD tracts as a proxy for relatedness, we traced each individual’s closest “genomic relative” within Denmark without explicitly quantifying the degree of relationship.

Figure 5A shows the distribution of the geographic distance of each of 399 individuals from their closest genomic relative (pink) and from a randomly chosen sample (green). The distribution of the distance from the closest relative (pink) showed an enrichment for very short distances, i.e. less than ~50 km, as well as a median value of 99.3 km – significantly closer than expected by chance at 131.4 km

(Mann-Whitney U = 60,997; p < 10-8). This points at a weak yet significant tendency of participants to live close to genetically similar individuals.

19 A B

0.015

200

medianrltv medianrand

150 0.010 closest relative random sample

100 Density

0.005

Mediangeographic distance (Km) 50

0.000 0

0 50 100 150 200 250 300 0 10 20 30 40 50 60

Distance (km) Rank of genomic relatedness

Figure 5. (A) Distribution of geographic distance of each participant’s place of birth (N = 399) to that of their closest “genomic relative” (pink), and to that of a randomly chosen sample (green). Genomic relatedness was defined on the grounds of total genomic IBD. Arrows point at median values of the two distributions (medianrltv =

99.3 Km; medianrand = 131.4 Km). Seven individuals with unknown geographic coordinates were excluded from this analysis. (B) Plot of rank of genomic relatedness vs. median geographic distance of each participant to their closest genomic relative. We created 57 equally sized bins of individuals increasingly related to their closest genomic relative (seven individuals per bin). Alternative bin sizes also produced significantly negative correlations

(data not shown).

To gain more insight into the latter observation, we grouped our Danish samples into ranked bins by the amount of total genomic IBD shared with their closest genomic relative (the higher the rank, the closer the relationship). We then calculated the median geographic distance of each participant to their closest relative within each bin and regressed this value against bin rank (Figure 5B) to observe a weak yet significant negative correlation (r ≈ -0.35; p < 0.01). This observation points out that geographic distance tends to be significantly shorter between Danes who share more genomic IBD. Interestingly, this signal was not picked up by kinship coefficient (data not shown).

20 Historical effective population size

Historical Ne showed notable disparity between the three Scandinavian countries

(Figure 6). In particular, point estimates of Ne in Denmark, Norway and Sweden showed a dramatic ~273-fold, ~262-fold and ~995-fold increase over the past 150 generations (~4,500±300 years), following the general upward European trend

(Browning and Browning 2015). What is more, Sweden’s and Denmark’s Ne estimates were significantly different from each other (no overlap between the corresponding 95% confidence intervals). During the High and Late Middle Ages, Ne in Denmark remained stable, suffering an almost imperceptible decline as a consequence of the otherwise devastating Black Death and similar patterns were

th observed for Norway and Sweden. From the 15 Century on, Ne in Denmark started to rise again at a moderate rate, whereas Ne in Norway and Sweden rose at an even higher rate, resulting in Norway’s Ne eventually surpassing that of Denmark.

Interestingly, even though Denmark and Norway currently have very similar census population sizes (5.614 and 5.084 million, respectively), Norway’s current Ne is 1.76 times larger than Denmark’s (even though not significantly different). Finally, the upward trend of Denmark’s Ne did not seem to be withheld by other important epidemics in the history of the country, such as cholera, the Spanish flu or, more recently, polio (Figure 6).

21 4000 1,000 ×

e N

3000

Sweden

1,000 Norway × 2000 e Denmark Year N Polio (1952) Spanish Flu (1918) Cholera (1853) Black Death (1350) 1000

0

Present 1894 - 1906 1787 - 1813 1467 - 1533 934 - 1066133 BC - 133 AD2800 - 2200 BC

Year

Figure 6. Change in effective population size (Ne) of the Danish, Swedish and Norwegian populations over the past 150 generations (log-scale). Shaded areas represent the upper and lower bounds of the 95% confidence intervals after bootstrapping. Uncertainty in generation length is represented by year intervals on the x-axis, assuming that each generation lasts 30±2 years. Black segments represent major epidemics from the recent history of the Danish population and are plotted taking into account generation length uncertainty. For more clarity, the inset shows the same graph with both axes in log-scale.

Polygenic prediction of height and BMI

Polygenic risk prediction in our Danish sample was overall far more accurate for height than for BMI (Figure S7). In the case of height, we observed maximum accuracy when assuming infinitesimal genetic architecture and adjusting for age, sex and ten ancestry PCs – R2 = 0.251±0.031 (Figure S7A) vs. ~0.17 elsewhere (Wood et al. 2014). When age, sex and the ten PCs were also included in the model, the

-71 prediction rose substantially to 0.639±0.015 (p = 6.57×10 ), a fact reflected in the strong and significant correlation between real and predicted height (Figure S7C). In contrast, although the maximum accuracy of BMI prediction was also observed when

22 we adjusted for age, sex and ten PCs, this was overall much poorer than for height –

R2 = 0.106±0.037 (Figure S7B) vs. ~0.065 elsewhere (Locke et al. 2015). In this case, when age, sex and ten PCs were included in the model, the improvement of accuracy

2 -18 was not as notable as for height (R = 0.195±0.034, p = 5.49×10 ), implying that

BMI is only marginally affected by age and sex (Figure S7D).

Discussion

The most notable observation of our study is the high genetic homogeneity of the

Danish population, possibly reflecting geographically uninterrupted gene flow, further facilitated by the extended network of sea-based commerce and travelling (Derry

2000). This observation should be appreciated in a historical context, as our analysis did not account for genetic contributions from recent immigration. Indeed, modern

Danish society accommodates different ethnic and cultural groups (Jensen and

Pedersen 2007), and this was also reflected in our sample, where ~4% of the participants were born in Afghanistan, China, Ethiopia, Finland, Germany, Greenland,

Iraq, Jordan, Korea, Kosovo, the Netherlands or Zambia. This number went up when we looked at grandparental origin, where 14.6% of the participants had at least one grandparent born outside Denmark.

The historical homogeneity of the Danish population is reflected in the lack of PCA clustering (Figure 2B), as well as in the notably low average FST between the six geographic regions. To put this observation in context, a previous study was able to detect subtle population structure along a north-south axis in the Netherlands

(Genome of the Netherlands Consortium 2014), a country of almost identical area to

Denmark’s. Even though we were able to detect significant correlation between PC1 and latitude (Figure 2C), the signal was less compelling than that from the Dutch

23 study. Similarly, a recent study of population structure in Great Britain (Leslie et al.

2015) found that the average pairwise FST estimates between 30 geographic regions was 0.0007 – 3.5 times higher than the value we report here (0.0002).

To better understand how our FST estimates compare to those from Great Britain, we calculated average FST values in two British population sets using FST values reported elsewhere (Leslie et al. 2015). The two British sets included regions with geographic distances comparable to those in Denmark. The first subset comprised of 13 populations from Central/Southern England with a resulting average FST < 0.0002.

The second subset comprised of nine populations from Northern England with a resulting average pairwise FST = 0.0003.

In general, the study of admixture within the European continent is confounded by a well-grounded isolation-by-distance mechanism (Lao et al. 2008; Novembre et al.

2008), as well as an increased historical complexity that renders most admixture models unrealistically simple (Busby et al. 2015). Denmark is no exception to these caveats. Even though it is tempting to explain the admixture proportions in Figures 3 and 4 as the result of historical admixing events, an alternative approach is to interpret such proportions as “mixture profiles”. Bearing this in mind, we see that the mixture profiles of all six Danish groups comprise two major ancestry components, one predominant in Scandinavia and the other predominant in Central Europe (Figure 4).

In the GLOBETROTTER analysis, these two components were identified as admixture contributions from the Swedish/Norwegian and the British clusters (Figure

3B; Table S2). In regard to the British contribution, however, this is actually more likely to reflect admixture in the opposite direction, i.e. from Denmark to Britain

(Leslie et al. 2015).

24 Even though the mixture profiles of the six regions in Denmark were overall quite similar, they differed in their membership to the Eastern European cluster. This particular cluster was more prevalent in South and East Denmark (Figure 4) and was independently observed in the GLOBETROTTER analysis as a small Polish contribution to Zealand and Funen (Figure 3B; Table S2). This signal could be due to historical Wend settlements in the broader area around in the southernmost part of Denmark (Derry 2000), an observation that is also supported by our admixture date estimates (Table 1). However, because similar mixture profiles were also observed in Sweden and Norway (Figure 4), it is difficult to choose between isolation by distance and actual admixture.

Given the high variance in the estimated admixture dates, the recurrence of the same combinations of admixing sources, and the largely unstructured nature of our Danish sample, we could interpret several of the admixture patterns in Table 1 as different instances of the same event. For instance, focusing on the two surrogates that

GLOBETROTTER found best tagged by POL and GER, we see that the admixture between them first occurred in Zealand in the 11th century, spreading to the rest of peninsular Denmark afterwards. This observation fits historical knowledge about the march of the Wends towards Denmark, as mentioned above.

A weak signal of population structure was also observed when we studied the geographic distribution of IBD-based relatedness. The median distance to the closest genomic relative was significantly smaller than to a randomly chosen sample, but it represents a minor effect (median difference ≈ 30 km; Figure 5A). In our regression analysis, we observed that this distance tended to be significantly smaller for pairs of individuals that were more identical, yet the correlation was overall modest (r ≈ -0.35;

25 Figure 5B). These observations point out that genetic structure indeed exists in

Denmark, even though it is very weak.

It is striking that Denmark, Sweden and Norway – three closely related Scandinavian countries – seem to differ considerably in their demographic history, as reflected in their historical Ne trajectories (Figure 6). Indeed, current Sweden-to-Norway census population size ratio is 1.89 – considerably close to their current Ne ratio (2.24) – implying that the two populations have had similar reproductive variance and/or population structure. On the contrary, Denmark’s Ne seems to have increased at a consistently lower rate after the Middle Ages. This could be reflecting weaker reproductive dynamics in the Danish population or weaker population structure compared to Sweden and Norway. Larger sample sizes are warranted to shed more light on the historical Ne trajectories in Scandinavian populations.

Finally, we found that self-reported adolescent height could be predicted with remarkable accuracy using essentially nothing but information derived directly

(genotypes) or indirectly (sex and ancestry) from DNA available at birth. When we combined SNP data with age, sex and PCA information, our predictor could explain more than half of the total phenotypic variance (63.9%). The remaining unexplained variance corresponded to a standard deviation of 5.43 cm. This means that, with 95% confidence, we are at most ~10.65 cm off in our prediction of adolescent height

(Figure S7C). It is worth noting that adding age to the model did not yield significant improvements to the prediction accuracy of height (data not shown), implying that adolescent height is a reliable measurement for the validation of the prediction. In addition, even though height was a self-reported measurement, its strong correlation with predicted adult height suggests that the students provided reliable personal information (Figure S7C). As for the poorer prediction of BMI, it is important to

26 remember that data training was carried out in adults, whereas prediction was validated in adolescents. Therefore, further improvement of the BMI prediction as subjects advance in age is a possibility.

In conclusion, our analysis showed that, by applying the simple criterion of participants having all four of their grandparents born in Denmark, we obtained a largely unstructured sample with very low FST values among different geographic regions. This high homogeneity has the potential of rendering population structure in large-scale Danish gene-mapping studies such as iPSYCH (http://ipsych.au.dk/) a lesser concern, regardless of adjustments such as genomic control, PCA (Price et al.

2006) or mixed models (Yu et al. 2006). We also observed a notably disparate demographic history in Demark compared to other Scandinavian countries – a fact that can also be ascribed to the high homogeneity of the Danish population – and that height in adolescents could be predicted with considerable accuracy. Lastly but not least importantly, this study stands as an example of how large-scale public engagement projects can provide mutual benefits for both the general public and the scientific community through the promotion of scientific knowledge.

Acknowledgements

The authors would like to thank the high school students and their teachers for participating in the Where Are You From? project, as well as Dagmar Hedvig Fog

Bjerre and Pia Johansson Nielsen for logistic support with organizing and shipping the DNA kits to the participating schools. Special thanks to Prof. Jes Fabricius Møller for helping us gain useful insights into the history and demography of the Danish population. G.A. would like to thank Garrett Hellenthal, Francesco Montinaro,

George Busby, Sharon Browning and Christopher Gignoux for their help and useful

27 discussions. This project and related research were supported from the National

Lottery Funds (Danske Spil), The Danish Ministry of Education and the Centre for

Biocultural History, Aarhus University. B.J.V. was supported by a grant from the

Danish Council for Independent Research (DFF-1325-0014). K.M.H. was supported by grants from the Swedish Research Council (K2014-62X-21445-05-3 and K2012-

63X-21445-03-2).

Literature Cited

Alexander D. H., Novembre J., Lange K., 2009 Fast model-based estimation of ancestry in unrelated individuals.

Genome Res. 19: 1655–1664.

Athanasiadis G., Jørgensen F. G., Cheng J. Y., Kjærgaard P. C., Schierup M. H., Mailund T., 2016 Spitting for

science: Danish high school students commit to a large-scale self-reported genetic study. PloS One.

Bae H. T., Sebastiani P., Sun J. X., Andersen S. L., Daw E. W., Terracciano A., Ferrucci L., Perls T. T., 2013

Genome-wide association study of personality traits in the long life family study. Front. Genet. 4: 65.

Browning B. L., Browning S. R., 2013a Improving the accuracy and efficiency of identity-by-descent detection in

population data. Genetics 194: 459–471.

Browning B. L., Browning S. R., 2013b Detecting identity by descent and estimating genotype error rates in

sequence data. Am. J. Hum. Genet. 93: 840–851.

Browning S. R., Browning B. L., 2015 Accurate Non-parametric Estimation of Recent Effective Population Size

from Segments of Identity by Descent. Am. J. Hum. Genet. 97: 404–418.

Bryc K., Durand E. Y., Macpherson J. M., Reich D., Mountain J. L., 2015 The genetic ancestry of African

Americans, Latinos, and European Americans across the United States. Am. J. Hum. Genet. 96: 37–53.

Busby G. B. J., Hellenthal G., Montinaro F., Tofanelli S., Bulayeva K., Rudan I., Zemunik T., Hayward C.,

Toncheva D., Karachanak-Yankova S., Nesheva D., Anagnostou P., Cali F., Brisighelli F., Romano V.,

Lefranc G., Buresi C., Ben Chibani J., Haj-Khelil A., Denden S., Ploski R., Krajewski P., Hervig T.,

Moen T., Herrera R. J., Wilson J. F., Myers S., Capelli C., 2015 The Role of Recent Admixture in

Forming the Contemporary West Eurasian Genomic Landscape. Curr. Biol. CB 25: 2518–2526.

28 Chang C. C., Chow C. C., Tellier L. C., Vattikuti S., Purcell S. M., Lee J. J., 2015 Second-generation PLINK:

rising to the challenge of larger and richer datasets. GigaScience 4: 7.

Delaneau O., Marchini J., Zagury J.-F., 2012 A linear complexity phasing method for thousands of genomes. Nat.

Methods 9: 179–181.

Derry T. K., 2000 History of Scandinavia: Norway, Sweden, Denmark, Finland, and Iceland. University of

Minnesota Press.

Devlin B., Roeder K., 1999 Genomic control for association studies. Biometrics 55: 997–1004.

Espeseth T., Christoforou A., Lundervold A. J., Steen V. M., Le Hellard S., Reinvang I., 2012 Imaging and

cognitive genetics: the Norwegian Cognitive NeuroGenetics sample. Twin Res. Hum. Genet. Off. J. Int.

Soc. Twin Stud. 15: 442–452.

Genome of the Netherlands Consortium, 2014 Whole-genome sequence variation, population structure and

demographic history of the Dutch population. Nat. Genet. 46: 818–825.

Green R. E., Krause J., Briggs A. W., Maricic T., Stenzel U., Kircher M., Patterson N., Li H., Zhai W., Fritz M.

H.-Y., Hansen N. F., Durand E. Y., Malaspinas A.-S., Jensen J. D., Marques-Bonet T., Alkan C., Prüfer

K., Meyer M., Burbano H. A., Good J. M., Schultz R., Aximu-Petri A., Butthof A., Höber B., Höffner

B., Siegemund M., Weihmann A., Nusbaum C., Lander E. S., Russ C., Novod N., Affourtit J., Egholm

M., Verna C., Rudan P., Brajkovic D., Kucan Z., Gusic I., Doronichev V. B., Golovanova L. V.,

Lalueza-Fox C., Rasilla M. de la, Fortea J., Rosas A., Schmitz R. W., Johnson P. L. F., Eichler E. E.,

Falush D., Birney E., Mullikin J. C., Slatkin M., Nielsen R., Kelso J., Lachmann M., Reich D., Pääbo S.,

2010 A draft sequence of the Neandertal genome. Science 328: 710–722.

Hellenthal G., Busby G. B. J., Band G., Wilson J. F., Capelli C., Falush D., Myers S., 2014 A genetic atlas of

human admixture history. Science 343: 747–751.

Howie B., Fuchsberger C., Stephens M., Marchini J., Abecasis G. R., 2012 Fast and accurate genotype imputation

in genome-wide association studies through pre-phasing. Nat. Genet. 44: 955–959.

Jakkula E., Rehnström K., Varilo T., Pietiläinen O. P. H., Paunio T., Pedersen N. L., deFaire U., Järvelin M.-R.,

Saharinen J., Freimer N., Ripatti S., Purcell S., Collins A., Daly M. J., Palotie A., Peltonen L., 2008 The

genome-wide patterns of variation expose significant substructure in a founder population. Am. J. Hum.

Genet. 83: 787–794.

29 Jensen P., Pedersen P. J., 2007 To Stay or Not to Stay? Out-Migration of Immigrants from Denmark. Int. Migr. 45:

87–113.

Karakachoff M., Duforet-Frebourg N., Simonet F., Le Scouarnec S., Pellen N., Lecointe S., Charpentier E., Gros

F., Cauchi S., Froguel P., Copin N., D.E.S.I.R. Study Group, Le Tourneau T., Probst V., Le Marec H.,

Molinaro S., Balkau B., Redon R., Schott J.-J., Blum M. G., Dina C., D E S I R Study Group, 2015 Fine-

scale human genetic structure in Western France. Eur. J. Hum. Genet. EJHG 23: 831–836.

Laaksovirta H., Peuralinna T., Schymick J. C., Scholz S. W., Lai S.-L., Myllykangas L., Sulkava R., Jansson L.,

Hernandez D. G., Gibbs J. R., Nalls M. A., Heckerman D., Tienari P. J., Traynor B. J., 2010

Chromosome 9p21 in amyotrophic lateral sclerosis in Finland: a genome-wide association study. Lancet

Neurol. 9: 978–985.

Lao O., Lu T. T., Nothnagel M., Junge O., Freitag-Wolf S., Caliebe A., Balascakova M., Bertranpetit J., Bindoff L.

A., Comas D., Holmlund G., Kouvatsi A., Macek M., Mollet I., Parson W., Palo J., Ploski R., Sajantila

A., Tagliabracci A., Gether U., Werge T., Rivadeneira F., Hofman A., Uitterlinden A. G., Gieger C.,

Wichmann H.-E., Rüther A., Schreiber S., Becker C., Nürnberg P., Nelson M. R., Krawczak M., Kayser

M., 2008 Correlation between genetic and geographic structure in Europe. Curr. Biol. CB 18: 1241–

1248.

Larsen M. H., Albrechtsen A., Thørner L. W., Werge T., Hansen T., Gether U., Haastrup E., Ullum H., 2013

Genome-Wide Association Study of Genetic Variants in LPS-Stimulated IL-6, IL-8, IL-10, IL-1ra and

TNF-α Cytokine Response in a Danish Cohort. PloS One 8: e66262.

Lawson D. J., Hellenthal G., Myers S., Falush D., 2012 Inference of population structure using dense haplotype

data. PLoS Genet. 8: e1002453.

Leslie S., Winney B., Hellenthal G., Davison D., Boumertit A., Day T., Hutnik K., Royrvik E. C., Cunliffe B.,

Wellcome Trust Case Control Consortium 2, International Multiple Sclerosis Genetics Consortium,

Lawson D. J., Falush D., Freeman C., Pirinen M., Myers S., Robinson M., Donnelly P., Bodmer W.,

2015 The fine-scale genetic structure of the British population. Nature 519: 309–314.

Li J. Z., Absher D. M., Tang H., Southwick A. M., Casto A. M., Ramachandran S., Cann H. M., Barsh G. S.,

Feldman M., Cavalli-Sforza L. L., Myers R. M., 2008 Worldwide human relationships inferred from

genome-wide patterns of variation. Science 319: 1100–1104.

30 Locke A. E., Kahali B., Berndt S. I., Justice A. E., Pers T. H., Day F. R., Powell C., Vedantam S., Buchkovich M.

L., Yang J., Croteau-Chonka D. C., Esko T., Fall T., Ferreira T., Gustafsson S., Kutalik Z., Luan J. ’an,

Mägi R., Randall J. C., Winkler T. W., Wood A. R., Workalemahu T., Faul J. D., Smith J. A., Hua Zhao

J., Zhao W., Chen J., Fehrmann R., Hedman Å. K., Karjalainen J., Schmidt E. M., Absher D., Amin N.,

Anderson D., Beekman M., Bolton J. L., Bragg-Gresham J. L., Buyske S., Demirkan A., Deng G., Ehret

G. B., Feenstra B., Feitosa M. F., Fischer K., Goel A., Gong J., Jackson A. U., Kanoni S., Kleber M. E.,

Kristiansson K., Lim U., Lotay V., Mangino M., Mateo Leach I., Medina-Gomez C., Medland S. E.,

Nalls M. A., Palmer C. D., Pasko D., Pechlivanis S., Peters M. J., Prokopenko I., Shungin D.,

Stančáková A., Strawbridge R. J., Ju Sung Y., Tanaka T., Teumer A., Trompet S., Laan S. W. van der,

Setten J. van, Van Vliet-Ostaptchouk J. V., Wang Z., Yengo L., Zhang W., Isaacs A., Albrecht E.,

Ärnlöv J., Arscott G. M., Attwood A. P., Bandinelli S., Barrett A., Bas I. N., Bellis C., Bennett A. J.,

Berne C., Blagieva R., Blüher M., Böhringer S., Bonnycastle L. L., Böttcher Y., Boyd H. A.,

Bruinenberg M., Caspersen I. H., Ida Chen Y.-D., Clarke R., Daw E. W., Craen A. J. M. de, Delgado G.,

Dimitriou M., Doney A. S. F., Eklund N., Estrada K., Eury E., Folkersen L., Fraser R. M., Garcia M. E.,

Geller F., Giedraitis V., Gigante B., Go A. S., Golay A., Goodall A. H., Gordon S. D., Gorski M., Grabe

H.-J., Grallert H., Grammer T. B., Gräßler J., Grönberg H., Groves C. J., Gusto G., Haessler J., Hall P.,

Haller T., Hallmans G., Hartman C. A., Hassinen M., Hayward C., Heard-Costa N. L., Helmer Q.,

Hengstenberg C., Holmen O., Hottenga J.-J., James A. L., Jeff J. M., Johansson Å., Jolley J., Juliusdottir

T., Kinnunen L., Koenig W., Koskenvuo M., Kratzer W., Laitinen J., Lamina C., Leander K., Lee N. R.,

Lichtner P., Lind L., Lindström J., Sin Lo K., Lobbens S., Lorbeer R., Lu Y., Mach F., Magnusson P. K.

E., Mahajan A., McArdle W. L., McLachlan S., Menni C., Merger S., Mihailov E., Milani L., Moayyeri

A., Monda K. L., Morken M. A., Mulas A., Müller G., Müller-Nurasyid M., Musk A. W., Nagaraja R.,

Nöthen M. M., Nolte I. M., Pilz S., Rayner N. W., Renstrom F., Rettig R., Ried J. S., Ripke S.,

Robertson N. R., Rose L. M., Sanna S., Scharnagl H., Scholtens S., Schumacher F. R., Scott W. R.,

Seufferlein T., Shi J., Vernon Smith A., Smolonska J., Stanton A. V., Steinthorsdottir V., Stirrups K.,

Stringham H. M., Sundström J., Swertz M. A., Swift A. J., Syvänen A.-C., Tan S.-T., Tayo B. O.,

Thorand B., Thorleifsson G., Tyrer J. P., Uh H.-W., Vandenput L., Verhulst F. C., Vermeulen S. H.,

Verweij N., Vonk J. M., Waite L. L., Warren H. R., Waterworth D., Weedon M. N., Wilkens L. R.,

Willenborg C., Wilsgaard T., Wojczynski M. K., Wong A., Wright A. F., Zhang Q., LifeLines Cohort

Study, Brennan E. P., Choi M., Dastani Z., Drong A. W., Eriksson P., Franco-Cereceda A., Gådin J. R.,

Gharavi A. G., Goddard M. E., Handsaker R. E., Huang J., Karpe F., Kathiresan S., Keildson S., Kiryluk

K., Kubo M., Lee J.-Y., Liang L., Lifton R. P., Ma B., McCarroll S. A., McKnight A. J., Min J. L.,

Moffatt M. F., Montgomery G. W., Murabito J. M., Nicholson G., Nyholt D. R., Okada Y., Perry J. R.

B., Dorajoo R., Reinmaa E., Salem R. M., Sandholm N., Scott R. A., Stolk L., Takahashi A., Tanaka T.,

31 Van’t Hooft F. M., Vinkhuyzen A. A. E., Westra H.-J., Zheng W., Zondervan K. T., ADIPOGen

Consortium, AGEN-BMI Working Group, CARDIOGRAMplusC4D Consortium, CKDGen

Consortium, GLGC, ICBP, MAGIC Investigators, MuTHER Consortium, MIGen Consortium, PAGE

Consortium, ReproGen Consortium, GENIE Consortium, International Endogene Consortium, Heath A.

C., Arveiler D., Bakker S. J. L., Beilby J., Bergman R. N., Blangero J., Bovet P., Campbell H., Caulfield

M. J., Cesana G., Chakravarti A., Chasman D. I., Chines P. S., Collins F. S., Crawford D. C., Cupples L.

A., Cusi D., Danesh J., Faire U. de, Ruijter H. M. den, Dominiczak A. F., Erbel R., Erdmann J., Eriksson

J. G., Farrall M., Felix S. B., Ferrannini E., Ferrières J., Ford I., Forouhi N. G., Forrester T., Franco O.

H., Gansevoort R. T., Gejman P. V., Gieger C., Gottesman O., Gudnason V., Gyllensten U., Hall A. S.,

Harris T. B., Hattersley A. T., Hicks A. A., Hindorff L. A., Hingorani A. D., Hofman A., Homuth G.,

Hovingh G. K., Humphries S. E., Hunt S. C., Hyppönen E., Illig T., Jacobs K. B., Jarvelin M.-R., Jöckel

K.-H., Johansen B., Jousilahti P., Jukema J. W., Jula A. M., Kaprio J., Kastelein J. J. P., Keinanen-

Kiukaanniemi S. M., Kiemeney L. A., Knekt P., Kooner J. S., Kooperberg C., Kovacs P., Kraja A. T.,

Kumari M., Kuusisto J., Lakka T. A., Langenberg C., Le Marchand L., Lehtimäki T., Lyssenko V.,

Männistö S., Marette A., Matise T. C., McKenzie C. A., McKnight B., Moll F. L., Morris A. D., Morris

A. P., Murray J. C., Nelis M., Ohlsson C., Oldehinkel A. J., Ong K. K., Madden P. A. F., Pasterkamp G.,

Peden J. F., Peters A., Postma D. S., Pramstaller P. P., Price J. F., Qi L., Raitakari O. T., Rankinen T.,

Rao D. C., Rice T. K., Ridker P. M., Rioux J. D., Ritchie M. D., Rudan I., Salomaa V., Samani N. J.,

Saramies J., Sarzynski M. A., Schunkert H., Schwarz P. E. H., Sever P., Shuldiner A. R., Sinisalo J.,

Stolk R. P., Strauch K., Tönjes A., Trégouët D.-A., Tremblay A., Tremoli E., Virtamo J., Vohl M.-C.,

Völker U., Waeber G., Willemsen G., Witteman J. C., Zillikens M. C., Adair L. S., Amouyel P.,

Asselbergs F. W., Assimes T. L., Bochud M., Boehm B. O., Boerwinkle E., Bornstein S. R., Bottinger E.

P., Bouchard C., Cauchi S., Chambers J. C., Chanock S. J., Cooper R. S., Bakker P. I. W. de, Dedoussis

G., Ferrucci L., Franks P. W., Froguel P., Groop L. C., Haiman C. A., Hamsten A., Hui J., Hunter D. J.,

Hveem K., Kaplan R. C., Kivimaki M., Kuh D., Laakso M., Liu Y., Martin N. G., März W., Melbye M.,

Metspalu A., Moebus S., Munroe P. B., Njølstad I., Oostra B. A., Palmer C. N. A., Pedersen N. L.,

Perola M., Pérusse L., Peters U., Power C., Quertermous T., Rauramaa R., Rivadeneira F., Saaristo T.

E., Saleheen D., Sattar N., Schadt E. E., Schlessinger D., Slagboom P. E., Snieder H., Spector T. D.,

Thorsteinsdottir U., Stumvoll M., Tuomilehto J., Uitterlinden A. G., Uusitupa M., Harst P. van der,

Walker M., Wallaschofski H., Wareham N. J., Watkins H., Weir D. R., Wichmann H.-E., Wilson J. F.,

Zanen P., Borecki I. B., Deloukas P., Fox C. S., Heid I. M., O’Connell J. R., Strachan D. P., Stefansson

K., Duijn C. M. van, Abecasis G. R., Franke L., Frayling T. M., McCarthy M. I., Visscher P. M.,

Scherag A., Willer C. J., Boehnke M., Mohlke K. L., Lindgren C. M., Beckmann J. S., Barroso I., North

32 K. E., Ingelsson E., Hirschhorn J. N., Loos R. J. F., Speliotes E. K., 2015 Genetic studies of body mass

index yield new insights for obesity biology. Nature 518: 197–206.

Manichaikul A., Mychaleckyj J. C., Rich S. S., Daly K., Sale M., Chen W.-M., 2010 Robust relationship inference

in genome-wide association studies. Bioinforma. Oxf. Engl. 26: 2867–2873.

Montinaro F., Busby G. B. J., Pascali V. L., Myers S., Hellenthal G., Capelli C., 2015 Unravelling the hidden

ancestry of American admixed populations. Nat. Commun. 6: 6596.

Moreno-Estrada A., Gignoux C. R., Fernández-López J. C., Zakharia F., Sikora M., Contreras A. V., Acuña-

Alonzo V., Sandoval K., Eng C., Romero-Hidalgo S., Ortiz-Tello P., Robles V., Kenny E. E., Nuño-

Arana I., Barquera-Lozano R., Macín-Pérez G., Granados-Arriola J., Huntsman S., Galanter J. M., Via

M., Ford J. G., Chapela R., Rodriguez-Cintron W., Rodríguez-Santana J. R., Romieu I., Sienra-Monge J.

J., Rio Navarro B. del, London S. J., Ruiz-Linares A., Garcia-Herrera R., Estrada K., Hidalgo-Miranda

A., Jimenez-Sanchez G., Carnevale A., Soberón X., Canizales-Quinteros S., Rangel-Villalobos H.,

Silva-Zolezzi I., Burchard E. G., Bustamante C. D., 2014 Human genetics. The genetics of Mexico

recapitulates Native American substructure and affects biomedical traits. Science 344: 1280–1285.

Moreno-Mayar J. V., Rasmussen S., Seguin-Orlando A., Rasmussen M., Liang M., Flåm S. T., Lie B. A., Gilfillan

G. D., Nielsen R., Thorsby E., Willerslev E., Malaspinas A.-S., 2014 Genome-wide ancestry patterns in

Rapanui suggest pre-European admixture with Native Americans. Curr. Biol. CB 24: 2518–2525.

Nelson M. R., Bryc K., King K. S., Indap A., Boyko A. R., Novembre J., Briley L. P., Maruyama Y., Waterworth

D. M., Waeber G., Vollenweider P., Oksenberg J. R., Hauser S. L., Stirnadel H. A., Kooner J. S.,

Chambers J. C., Jones B., Mooser V., Bustamante C. D., Roses A. D., Burns D. K., Ehm M. G., Lai E.

H., 2008 The Population Reference Sample, POPRES: a resource for population, disease, and

pharmacological genetics research. Am. J. Hum. Genet. 83: 347–358.

Novembre J., Johnson T., Bryc K., Kutalik Z., Boyko A. R., Auton A., Indap A., King K. S., Bergmann S., Nelson

M. R., Stephens M., Bustamante C. D., 2008 Genes mirror geography within Europe. Nature 456: 98–

101.

Price A. L., Patterson N. J., Plenge R. M., Weinblatt M. E., Shadick N. A., Reich D., 2006 Principal components

analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38: 904–909.

33 Price A. L., Helgason A., Palsson S., Stefansson H., St Clair D., Andreassen O. A., Reich D., Kong A., Stefansson

K., 2009 The impact of divergence time on the nature of population structure: an example from Iceland.

PLoS Genet. 5: e1000505.

Pritchard J. K., Stephens M., Donnelly P., 2000 Inference of population structure using multilocus genotype data.

Genetics 155: 945–959.

Reich D., Thangaraj K., Patterson N., Price A. L., Singh L., 2009 Reconstructing Indian population history. Nature

461: 489–494.

Reich D., Patterson N., Kircher M., Delfin F., Nandineni M. R., Pugach I., Ko A. M.-S., Ko Y.-C., Jinam T. A.,

Phipps M. E., Saitou N., Wollstein A., Kayser M., Pääbo S., Stoneking M., 2011 Denisova admixture

and the first modern human dispersals into Southeast Asia and Oceania. Am. J. Hum. Genet. 89: 516–

528.

Ripke S., O’Dushlaine C., Chambert K., Moran J. L., Kähler A. K., Akterin S., Bergen S. E., Collins A. L.,

Crowley J. J., Fromer M., Kim Y., Lee S. H., Magnusson P. K. E., Sanchez N., Stahl E. A., Williams S.,

Wray N. R., Xia K., Bettella F., Borglum A. D., Bulik-Sullivan B. K., Cormican P., Craddock N., Leeuw

C. de, Durmishi N., Gill M., Golimbet V., Hamshere M. L., Holmans P., Hougaard D. M., Kendler K. S.,

Lin K., Morris D. W., O., Mortensen P. B., Neale B. M., O’Neill F. A., Owen M. J., Milovancevic

M. P., Posthuma D., Powell J., Richards A. L., Riley B. P., Ruderfer D., Rujescu D., Sigurdsson E.,

Silagadze T., Smit A. B., Stefansson H., Steinberg S., Suvisaari J., Tosato S., Verhage M., Walters J. T.,

Multicenter Genetic Studies of Schizophrenia Consortium, Levinson D. F., Gejman P. V., Kendler K. S.,

Laurent C., Mowry B. J., O’Donovan M. C., Owen M. J., Pulver A. E., Riley B. P., Schwab S. G.,

Wildenauer D. B., Dudbridge F., Holmans P., Shi J., Albus M., Alexander M., Campion D., Cohen D.,

Dikeos D., Duan J., Eichhammer P., Godard S., Hansen M., Lerer F. B., Liang K.-Y., Maier W., Mallet

J., Nertney D. A., Nestadt G., Norton N., O’Neill F. A., Papadimitriou G. N., Ribble R., Sanders A. R.,

Silverman J. M., Walsh D., Williams N. M., Wormley B., Psychosis Endophenotypes International

Consortium, Arranz M. J., Bakker S., Bender S., Bramon E., Collier D., Crespo-Facorro B., Hall J.,

Iyegbe C., Jablensky A., Kahn R. S., Kalaydjieva L., Lawrie S., Lewis C. M., Lin K., Linszen D. H.,

Mata I., McIntosh A., Murray R. M., Ophoff R. A., Powell J., Rujescu D., Van Os J., Walshe M.,

Weisbrod M., Wiersma D., Wellcome Trust Case Control Consortium 2, Donnelly P., Barroso I.,

Blackwell J. M., Bramon E., Brown M. A., Casas J. P., Corvin A. P., Deloukas P., Duncanson A.,

Jankowski J., Markus H. S., Mathew C. G., Palmer C. N. A., Plomin R., Rautanen A., Sawcer S. J.,

Trembath R. C., Viswanathan A. C., Wood N. W., Spencer C. C. A., Band G., Bellenguez C., Freeman

34 C., Hellenthal G., Giannoulatou E., Pirinen M., Pearson R. D., Strange A., Su Z., Vukcevic D., Donnelly

P., Langford C., Hunt S. E., Edkins S., Gwilliam R., Blackburn H., Bumpstead S. J., Dronov S., Gillman

M., Gray E., Hammond N., Jayakumar A., McCann O. T., Liddle J., Potter S. C., Ravindrarajah R.,

Ricketts M., Tashakkori-Ghanbaria A., Waller M. J., Weston P., Widaa S., Whittaker P., Barroso I.,

Deloukas P., Mathew C. G., Blackwell J. M., Brown M. A., Corvin A. P., McCarthy M. I., Spencer C. C.

A., Bramon E., Corvin A. P., O’Donovan M. C., Stefansson K., Scolnick E., Purcell S., McCarroll S. A.,

Sklar P., Hultman C. M., Sullivan P. F., 2013 Genome-wide association analysis identifies 13 new risk

loci for schizophrenia. Nat. Genet. 45: 1150–1159.

Tang H., Peng J., Wang P., Risch N. J., 2005 Estimation of individual admixture: analytical and study design

considerations. Genet. Epidemiol. 28: 289–301.

Vilhjálmsson B. J., Yang J., Finucane H. K., Gusev A., Lindström S., Ripke S., Genovese G., Loh P.-R., Bhatia

G., Do R., Hayeck T., Won H.-H., Schizophrenia Working Group of the Psychiatric Genomics

Consortium, Discovery, Biology, and Risk of Inherited Variants in Breast Cancer (DRIVE) study,

Kathiresan S., Pato M., Pato C., Tamimi R., Stahl E., Zaitlen N., Pasaniuc B., Belbin G., Kenny E. E.,

Schierup M. H., De Jager P., Patsopoulos N. A., McCarroll S., Daly M., Purcell S., Chasman D., Neale

B., Goddard M., Visscher P. M., Kraft P., Patterson N., Price A. L., Discovery Biology and Risk of

Inherited Variants in Breast Cancer DRIVE study, 2015 Modeling Linkage Disequilibrium Increases

Accuracy of Polygenic Risk Scores. Am. J. Hum. Genet. 97: 576–592.

Visscher P. M., Hill W. G., Wray N. R., 2008 Heritability in the genomics era--concepts and misconceptions. Nat.

Rev. Genet. 9: 255–266.

Welter D., MacArthur J., Morales J., Burdett T., Hall P., Junkins H., Klemm A., Flicek P., Manolio T., Hindorff

L., Parkinson H., 2014 The NHGRI GWAS Catalog, a curated resource of SNP-trait associations.

Nucleic Acids Res. 42: D1001–1006.

Wood A. R., Esko T., Yang J., Vedantam S., Pers T. H., Gustafsson S., Chu A. Y., Estrada K., Luan J. ’an, Kutalik

Z., Amin N., Buchkovich M. L., Croteau-Chonka D. C., Day F. R., Duan Y., Fall T., Fehrmann R.,

Ferreira T., Jackson A. U., Karjalainen J., Lo K. S., Locke A. E., Mägi R., Mihailov E., Porcu E.,

Randall J. C., Scherag A., Vinkhuyzen A. A. E., Westra H.-J., Winkler T. W., Workalemahu T., Zhao J.

H., Absher D., Albrecht E., Anderson D., Baron J., Beekman M., Demirkan A., Ehret G. B., Feenstra B.,

Feitosa M. F., Fischer K., Fraser R. M., Goel A., Gong J., Justice A. E., Kanoni S., Kleber M. E.,

Kristiansson K., Lim U., Lotay V., Lui J. C., Mangino M., Mateo Leach I., Medina-Gomez C., Nalls M.

A., Nyholt D. R., Palmer C. D., Pasko D., Pechlivanis S., Prokopenko I., Ried J. S., Ripke S., Shungin

35 D., Stancáková A., Strawbridge R. J., Sung Y. J., Tanaka T., Teumer A., Trompet S., Laan S. W. van

der, Setten J. van, Van Vliet-Ostaptchouk J. V., Wang Z., Yengo L., Zhang W., Afzal U., Arnlöv J.,

Arscott G. M., Bandinelli S., Barrett A., Bellis C., Bennett A. J., Berne C., Blüher M., Bolton J. L.,

Böttcher Y., Boyd H. A., Bruinenberg M., Buckley B. M., Buyske S., Caspersen I. H., Chines P. S.,

Clarke R., Claudi-Boehm S., Cooper M., Daw E. W., De Jong P. A., Deelen J., Delgado G., Denny J. C.,

Dhonukshe-Rutten R., Dimitriou M., Doney A. S. F., Dörr M., Eklund N., Eury E., Folkersen L., Garcia

M. E., Geller F., Giedraitis V., Go A. S., Grallert H., Grammer T. B., Gräßler J., Grönberg H., Groot L.

C. P. G. M. de, Groves C. J., Haessler J., Hall P., Haller T., Hallmans G., Hannemann A., Hartman C.

A., Hassinen M., Hayward C., Heard-Costa N. L., Helmer Q., Hemani G., Henders A. K., Hillege H. L.,

Hlatky M. A., Hoffmann W., Hoffmann P., Holmen O., Houwing-Duistermaat J. J., Illig T., Isaacs A.,

James A. L., Jeff J., Johansen B., Johansson Å., Jolley J., Juliusdottir T., Junttila J., Kho A. N.,

Kinnunen L., Klopp N., Kocher T., Kratzer W., Lichtner P., Lind L., Lindström J., Lobbens S.,

Lorentzon M., Lu Y., Lyssenko V., Magnusson P. K. E., Mahajan A., Maillard M., McArdle W. L.,

McKenzie C. A., McLachlan S., McLaren P. J., Menni C., Merger S., Milani L., Moayyeri A., Monda K.

L., Morken M. A., Müller G., Müller-Nurasyid M., Musk A. W., Narisu N., Nauck M., Nolte I. M.,

Nöthen M. M., Oozageer L., Pilz S., Rayner N. W., Renstrom F., Robertson N. R., Rose L. M., Roussel

R., Sanna S., Scharnagl H., Scholtens S., Schumacher F. R., Schunkert H., Scott R. A., Sehmi J.,

Seufferlein T., Shi J., Silventoinen K., Smit J. H., Smith A. V., Smolonska J., Stanton A. V., Stirrups K.,

Stott D. J., Stringham H. M., Sundström J., Swertz M. A., Syvänen A.-C., Tayo B. O., Thorleifsson G.,

Tyrer J. P., Dijk S. van, Schoor N. M. van, Velde N. van der, Heemst D. van, Oort F. V. A. van,

Vermeulen S. H., Verweij N., Vonk J. M., Waite L. L., Waldenberger M., Wennauer R., Wilkens L. R.,

Willenborg C., Wilsgaard T., Wojczynski M. K., Wong A., Wright A. F., Zhang Q., Arveiler D., Bakker

S. J. L., Beilby J., Bergman R. N., Bergmann S., Biffar R., Blangero J., Boomsma D. I., Bornstein S. R.,

Bovet P., Brambilla P., Brown M. J., Campbell H., Caulfield M. J., Chakravarti A., Collins R., Collins F.

S., Crawford D. C., Cupples L. A., Danesh J., Faire U. de, Ruijter H. M. den, Erbel R., Erdmann J.,

Eriksson J. G., Farrall M., Ferrannini E., Ferrières J., Ford I., Forouhi N. G., Forrester T., Gansevoort R.

T., Gejman P. V., Gieger C., Golay A., Gottesman O., Gudnason V., Gyllensten U., Haas D. W., Hall A.

S., Harris T. B., Hattersley A. T., Heath A. C., Hengstenberg C., Hicks A. A., Hindorff L. A., Hingorani

A. D., Hofman A., Hovingh G. K., Humphries S. E., Hunt S. C., Hypponen E., Jacobs K. B., Jarvelin

M.-R., Jousilahti P., Jula A. M., Kaprio J., Kastelein J. J. P., Kayser M., Kee F., Keinanen-Kiukaanniemi

S. M., Kiemeney L. A., Kooner J. S., Kooperberg C., Koskinen S., Kovacs P., Kraja A. T., Kumari M.,

Kuusisto J., Lakka T. A., Langenberg C., Le Marchand L., Lehtimäki T., Lupoli S., Madden P. A. F.,

Männistö S., Manunta P., Marette A., Matise T. C., McKnight B., Meitinger T., Moll F. L., Montgomery

G. W., Morris A. D., Morris A. P., Murray J. C., Nelis M., Ohlsson C., Oldehinkel A. J., Ong K. K.,

36 Ouwehand W. H., Pasterkamp G., Peters A., Pramstaller P. P., Price J. F., Qi L., Raitakari O. T.,

Rankinen T., Rao D. C., Rice T. K., Ritchie M., Rudan I., Salomaa V., Samani N. J., Saramies J.,

Sarzynski M. A., Schwarz P. E. H., Sebert S., Sever P., Shuldiner A. R., Sinisalo J., Steinthorsdottir V.,

Stolk R. P., Tardif J.-C., Tönjes A., Tremblay A., Tremoli E., Virtamo J., Vohl M.-C., Electronic

Medical Records and Genomics (eMEMERGEGE) Consortium, MIGen Consortium, PAGEGE

Consortium, LifeLines Cohort Study, Amouyel P., Asselbergs F. W., Assimes T. L., Bochud M., Boehm

B. O., Boerwinkle E., Bottinger E. P., Bouchard C., Cauchi S., Chambers J. C., Chanock S. J., Cooper R.

S., Bakker P. I. W. de, Dedoussis G., Ferrucci L., Franks P. W., Froguel P., Groop L. C., Haiman C. A.,

Hamsten A., Hayes M. G., Hui J., Hunter D. J., Hveem K., Jukema J. W., Kaplan R. C., Kivimaki M.,

Kuh D., Laakso M., Liu Y., Martin N. G., März W., Melbye M., Moebus S., Munroe P. B., Njølstad I.,

Oostra B. A., Palmer C. N. A., Pedersen N. L., Perola M., Pérusse L., Peters U., Powell J. E., Power C.,

Quertermous T., Rauramaa R., Reinmaa E., Ridker P. M., Rivadeneira F., Rotter J. I., Saaristo T. E.,

Saleheen D., Schlessinger D., Slagboom P. E., Snieder H., Spector T. D., Strauch K., Stumvoll M.,

Tuomilehto J., Uusitupa M., Harst P. van der, Völzke H., Walker M., Wareham N. J., Watkins H.,

Wichmann H.-E., Wilson J. F., Zanen P., Deloukas P., Heid I. M., Lindgren C. M., Mohlke K. L.,

Speliotes E. K., Thorsteinsdottir U., Barroso I., Fox C. S., North K. E., Strachan D. P., Beckmann J. S.,

Berndt S. I., Boehnke M., Borecki I. B., McCarthy M. I., Metspalu A., Stefansson K., Uitterlinden A. G.,

Duijn C. M. van, Franke L., Willer C. J., Price A. L., Lettre G., Loos R. J. F., Weedon M. N., Ingelsson

E., O’Connell J. R., Abecasis G. R., Chasman D. I., Goddard M. E., Visscher P. M., Hirschhorn J. N.,

Frayling T. M., 2014 Defining the role of common variation in the genomic and biological architecture

of adult human height. Nat. Genet. 46: 1173–1186.

Yu J., Pressoir G., Briggs W. H., Vroh Bi I., Yamasaki M., Doebley J. F., McMullen M. D., Gaut B. S., Nielsen D.

M., Holland J. B., Kresovich S., Buckler E. S., 2006 A unified mixed-model method for association

mapping that accounts for multiple levels of relatedness. Nat. Genet. 38: 203–208.

Supplementary Figure Legends

Figure S1. Co-ancestry curves for all pairs of surrogate populations identified in

Zealand (Figure 3B; Table S2).

Figure S2. Co-ancestry curves for all pairs of surrogate populations identified in

South Jutland (Figure 3B; Table S2).

37 Figure S3. Co-ancestry curves for all pairs of surrogate populations identified in

Central Jutland (Figure 3B; Table S2).

Figure S4. Co-ancestry curves for all pairs of surrogate populations identified in

North Jutland (Figure 3B; Table S2).

Figure S5. Ancestral component analysis of 13 European countries (including six well-defined geographic regions in Denmark shown in the inset) assuming K = {2, …,

9} ancestral populations.

Figure S6. Proportion of zero IBS sharing vs. kinship coefficient for 82,215 pairs of individuals in a sample of 406 high school students who had all four of their grandparents born in Denmark. Following KING’s guidelines, an estimated kinship coefficient range (i) > 0.354, (ii) [0.177, 0.354], (iii) [0.0884, 0.177], (iv) [0.0442,

0.0884] and (v) [0, 0.0221] was considered to correspond to duplicate/MZ twin, 1st- degree, 2nd-degree, 3rd-degree and 4th-degree (or further) relationships respectively.

The vast majority of pairs were either unrelated (negative kinship coefficient in the grey-shaded part of the plot) or distantly related (kinship ∈ (0, 0.0221) in the red- shaded part of the plot).

Figure S7. Prediction accuracy of LDpred for height and BMI when age, sex and

PCA information were (i) adjusted for (A and C), and (ii) included (B and D) in the model.

38 Table 1. Summary of GLOBETROTTER’s inference of the date of admixture and

composition of admixing sources for each of four geographic regions of Denmark

Admixture event #1 Admixture event #2

Region Best fitting type Goodness Years before Year Minor Major Ratio Minor Major Ratio

of admixture of fit (R2) present [95% CI] component component component component

Zealand 1 date multi-way 0.208 943 [579, 1488] 1052 BRI SWE 0.47:0.53 GER POL 0.47:0.53

(N = 36)

S.Jutland 1 date multi-way 0.136 756 [232, 1548] 1239 FRA POL 0.21:0.79 GER POL 0.43:0.57

(N = 37)

C.Jutland 1 date multi-way 0.354 611 [418, 823] 1384 BRI POL 0.31:0.69 POL GER 0.45:0.55

(N = 89)

N.Jutland 1 date multi-way 0.239 434 [118, 741] 1561 BRI POL 0.33:0.67 GER POL 0.42:0.58

(N = 48)

Type of admixture was chosen among three alternatives: (i) one date; (ii) one date

multi-way; (ii) two or more dates. Multi-way admixture involves two admixture

events. If a source population appears in both admixture events #1 and #2, then the

interpretation is “three-way admixture” (for more details see Busby et al. 2015). R2:

goodness of fit for a single date of admixture, taking the maximum value across all

inferred co-ancestry curves (R2 ≥ 0.2 corresponds to good fit). GLOBETROTTER

returns date estimates in generations. We multiplied these estimates by 30

years/generation. We retrieved historical dates by subtracting year estimates from

1995 (the median year of birth for our high school students). CI: confidence interval.

39 Table 2: Percentage of membership of 13 European countries (including six well- defined geographic regions in Denmark shown in the inset) to each of K = 4 ancestral populations

Ancestral component

% Southern % Eastern Country/region (blue*) (yellow*) % Central (red*) % Nordic (green*)

PT 72.43 8.59 12.15 6.83

ES 78.32 3.57 9.25 8.86

FR 45.70 15.19 19.23 19.88

BE 35.64 14.06 20.84 29.46

GB 30.08 32.28 15.20 22.45

NL 24.88 20.06 23.62 31.44

DE 23.10 17.22 22.40 37.28

AT 20.28 17.44 19.88 42.39

HU 29.41 49.76 11.83 9.00

PL 3.59 76.19 9.83 10.38

Ca 5.39 21.27 24.94 48.40

Ze 7.73 18.90 25.67 47.71

Fu 6.51 20.36 28.28 44.85

SJ 5.94 19.49 28.94 45.64

CJ 7.34 13.35 30.30 49.01

NJ 5.82 13.23 32.35 48.60

NO 8.56 13.24 30.91 47.29

SE 4.98 17.39 27.27 50.35

*This table complements Figure 4; country/region abbreviations are explained in

Figure 2.

40