The Genetic Structure of the Belgian Population Jimmy Van Den Eynden1,2* , Tine Descamps1, Els Delporte1, Nancy H
Total Page:16
File Type:pdf, Size:1020Kb
Van den Eynden et al. Human Genomics (2018) 12:6 DOI 10.1186/s40246-018-0136-8 PRIMARY RESEARCH Open Access The genetic structure of the Belgian population Jimmy Van den Eynden1,2* , Tine Descamps1, Els Delporte1, Nancy H. C. Roosens1, Sigrid C. J. De Keersmaecker1, Vanessa De Wit1, Joris Robert Vermeesch3, Els Goetghebeur4, Jean Tafforeau1, Stefaan Demarest1, Marc Van den Bulcke1 and Herman Van Oyen1,5* Abstract Background: National and international efforts like the 1000 Genomes Project are leading to increasing insights in the genetic structure of populations worldwide. Variation between different populations necessitates access to population-based genetic reference datasets. These data, which are important not only in clinical settings but also to potentiate future transitions towards a more personalized public health approach, are currently not available for the Belgian population. Results: To obtain a representative genetic dataset of the Belgian population, participants in the 2013 National Health Interview Survey (NHIS) were invited to donate saliva samples for DNA analysis. DNA was isolated and single nucleotide polymorphisms (SNPs) were determined using a genome-wide SNP array of around 300,000 sites, resulting in a high-quality dataset of 189 samples that was used for further analysis. A principal component analysis demonstrated the typical European genetic constitution of the Belgian population, as compared to other continents. Within Europe, the Belgian population could be clearly distinguished from other European populations. Furthermore, obvious signs from recent migration were found, mainly from Southern Europe and Africa, corresponding with migration trends from the past decades. Within Belgium, a small north-west to south-east gradient in genetic variability was noted, with differences between Flanders and Wallonia. Conclusions: This is the first study on the genetic structure of the Belgian population and its regional variation. The Belgian genetic structure mirrors its geographic location in Europe with regional differences and clear signs of recent migration. Keywords: Genetic variability, Population genomics, Public health genomics Background studies, population-based genetic reference data are re- After the completion of the Human Genome Project in quired from more specific and extended populations [3]. 2003, international efforts were initiated to map human To address this, population-based whole genome sequen- genetic variation between populations. This variation has cing initiatives have been performed at the national level been described for 26 populations worldwide via the 1000 throughout Europe [4–7]. From these genetic population Genomes Project [1, 2]. While this is a valuable resource studies, it has become clear that there is a strong correl- for studying global genetic variation, both the number of ation between the geographical location and the genetic samples per population and the total number of popula- structure of different populations [8, 9], also at the more re- tions studied are relatively low. To gain sufficient statistical gional level (e.g., along a north-south axis in the power and avoid false positives/negatives due to unmatched Netherlands) [6, 7, 9, 10]. Currently, this genetic informa- control populations in genotype-phenotype association tion is not available for the Belgian population. Therefore, a population-based cross-sectional study, called BelPHG-21 (Belgian Public Health Genomics in the * Correspondence: twenty-first century), was organized that aims at describing [email protected]; [email protected] 1Scientific Institute of Public Health, Brussels, Belgium the genetic variability in the Belgian population and relating Full list of author information is available at the end of the article © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Van den Eynden et al. Human Genomics (2018) 12:6 Page 2 of 9 this variability to several indicators of health and disease to anticipate a future transition towards precision public health. To accomplish this, the BelPHG-21 study was orga- nized in the context of the 2013 Belgian National Health Interview Survey (NHIS). The NHIS is a population-based survey that has been periodically organized since 1997. It has the purpose to assess the health status and its distribu- tion in the population and to describe the association of this health status with its main determinants [11]. Here, we report on the BelPHG-21 results related to the genetic structure of the Belgian population. By in- corporating NHIS information about the study subject’s residence and parental country of birth, we demonstrate small but clear genetic differences between different re- gions and illustrate how the population’s genetic struc- ture is shaped by recent migration waves. Results Factors determining consent for former NHIS participants to participate in the BelPHG-21 population genetic study To study the genetic structure and variability in the Belgian population, a subset of participants from the most recent NHIS, conducted on 10,829 inhabitants in 2013 [12], was invited to donate saliva samples for DNA analysis (Fig. 1). From the invited subsample of 1468 individuals, 210 (14%) consented and submitted saliva Fig. 1 BelPHG-21 study flow chart, Belgium 2016 samples for DNA analysis. Using the NHIS information related to demographics, education, employment, income, health behavior, lifestyle characteristics, physical genome build as a reference and variant allele frequen- and social environment, and wellbeing characteristics, cies were calculated for a total number of 261,079 SNPs we applied a predictive weighted binomial regression for which genotyping information was available analysis to identify the factors that determined consent (Additional file 1:TableS1). to participate in this study. This resulted in a model con- Belgium is composed of three different geographical taining four variables (smoking behavior, education, age, regions: Flanders, Brussels, and Wallonia. The 189 sam- and region of residence) (Table 1). The odds for consent ples that were used for analysis in this study were were higher in non-smokers (and higher in former donated by volunteers in all three regions: 98 from smokers than never smokers), higher age, and higher Flanders, 62 from Wallonia, and 29 from Brussels educational attainment. Furthermore, the likelihood for consent was also higher in the Flemish Region compared Table 1 Variables predicting consent, BelPHG-21 study, Belgium to the other regions. 2016 Variable Parameter OR 95% CI OR DNA sampling from the Belgian population Smoking habits (ref “Smoker”) Former smoker 2.21 [1.38, 3.62] DNA was extracted from saliva samples obtained from Never smoker 1.38 [0.92, 2.15] 210 consenting participants. Three of these samples Age Age (years) 1.02 [1.01, 1.02] contained insufficient saliva, while the DNA concentra- “ ” ’ tion of eight other samples was considered too low for Residence (ref Flanders ) Brussels region 0.45 [0.24, 0.77] downstream analysis (Fig. 1). From the remaining 199 Walloon region 0.58 [0.41, 0.80] samples, single nucleotide polymorphisms (SNPs) at ± Education (ref “No diploma”) Lower secondary 2.29 [1.38, 7.02] 300,000 sites were measured using the whole genome Higher secondary 4.13 [2.12, 9.50] scanning Illumina HumanCytoSNP-12 microarray. Higher 6.66 [3.40, 15.44] Microarray results from 10 samples were excluded due Not applicable 5.31 [2.25, 14.06] to insufficient call rates (i.e., lower than 0.6), resulting A weighted binomial regression analysis to predict consent contained four in 189 samples that were used for further analysis. variables (smoking habits, age, residence, and education). For each parameter, These samples were genotyped using the hg19 human the odds ratio (OR) and 95% CI interval are given Van den Eynden et al. Human Genomics (2018) 12:6 Page 3 of 9 (Fig. 2). These numbers are proportional to the popu- The Belgian population is a typical European population lation size of the respective regions with an overrep- with genetic signals of recent migration from the African resentation from the Brussels region (15.3, 17.3, and continent 24.7 samples per million inhabitants for Flanders, We first compared the genetic structure of the Belgian Wallonia, and Brussels respectively) (Additional file 2: population with other populations worldwide. To restrict Table S2). This oversampling of the Brussels region is this analysis to the most informative and independent related to the way the NHIS was constructed [11]. SNPs, only SNPs located on autosomes with a minor allele Similarly, the distribution of these samples was rela- frequency of at least 5%, less than 2% missing values, and a tively proportional to the population size from each linkage disequilibrium lower than 0.2 were used for struc- of 10 provinces in Belgium with a variation from 11.3 tural comparison. These filtering criteria resulted in a selec- to 26.7 samples per million