What’s in the Dutch Genome?
Cisca Wijmenga, UMC, Groningen / Dorret Boomsma, VU, Amsterdam
Exciting times for genetics and large scale sequencing 1,000 Genome Project (Oct 2015)
Results of an early next-generation DNA sequencing run. Recent advances enable parallel sequencing reactions that can produce up to 500 giga-bases of data per instrument run. These data allow us to better understand the frequency of genetic mutations and their impact on human disease. Genome of the Netherlands GoNL (Aug 2014) What’s in the Dutch genome?
GoNL: 7.6 million novel SNVs , with an excess of rare nonsense variants and frameshift indels, …
An average individual: 60 loss-of-function SNVs, 69 loss-of-function indels, 15 loss-of-function large deletions, and 20 disease- causing variants, …
SNV=Single Nucleotide Variant; Indel = insertion or deletion of bases
What makes GoNL unique? (1) Representative sample of the Dutch population
Selection criteria
• Caucasian background ((grand) parents born in same region) • Equal distribution across provinces • Unselected for disease
Boomsma et al., EJHG 2014 Made possible by ~1M Dutch inhabitants participating in biobank research
LifeLines cohort study UMCG Netherlands Twin Register VU
Leiden Longevity study LUMC
Rotterdam Elderly >200 biobanks participate Study Erasmus MC in BBMRI-NL BBMRI (Biobanking and Biomolecular Resources Research Infrastructure) NL Important: major genetic differences between the North and the South
Abdellaoui, et al. (2013) Eur J Hum Genet Important: major genetic differences between the North and the South
Catholic Protestant Non-religious
NL in NL 1849 today
Abdellaoui, et al. (2013) Association between autozygosity and major depression: stratification due to religious assortment. Behavior Genetics Ancestry-informative PCs replicated in next-generation sequencing dataset GoNL What makes GoNL unique? (2) Trio design
Excellent haplotypes – high-quality imputation Insight into de novo events GoNL: investigating de novo SNV and SV mutations from whole genomes
Father Mother
Child
GoNL: 11,020 de novo single nucleotide variants (SNVs) 332 de novo structural variants (SVs) 11,020 high-confidence de novo mutations = 18–74 per offspring 74% of new mutations are derived from fathers and increase with age
Pearson’s correlation = 0.47, P < 2.2 × 10−16
Father Mother
A A AA
Child
A G G has never been observed before G is regarded a new mutation +1.47 new mutations per extra year (age)
The Genome of the Netherlands Consortium , Nat Genet 2014 GoNL includes monozygotic twins: germ line vs somatic mutation rate
~97% of de novo mutations are germ line
~3% of de novo mutations are somatic Are de novo mutation rates correlated with replication timing?
Early replicating DNA (gene-rich regions)
Late replicating DNA (gene-poor regions)
Do the paternal age effects have functional consequences? Most likely!
N of genic (red) and intergenic (blue) de novo mutations in offspring (log scale) as a function of paternal age. The red line = regression for genic mutations; blue = regression for intergenic mutations. The steeper slope of the red line for genic mutations indicates a faster relative increase in the rate of genic mutations with paternal age.
+0.26% de novo mutations in genic regions per extra year of paternal age .
Offspring of 40y-old vs 20y-old fathers: 19.06 vs 9.63 genic mutations
Elevated mutation rates driven by CpGs Francioli et al., Nat Genet 2015 The distribution of de novo mutations is non-random
Closely spaced mutations are enriched both across and within individuals (78 mutation clusters of up to 20 kb in length are observed). 1.5% of all de novo mutations are in such clusters
Francioli et al., Nat Genet 2015 GoNL also allowed investigation of de novo structural variations (SVs)
Reference
SNP
Deletion
Insertion
Tandem duplication
Profiling de novo structural variants requires a host of different methods
Kloosterman et al., Genome Research 2015 One in seven children bears a de novo SV mutation
Six offspring had two and one offspring had three de novo SVs
Kloosterman et al., Genome Research 2015 An excess of de novo structural changes originated on paternal haplotypes
A significantly larger fraction (66.1%) of indels and SVs are arising on paternal chromosomes than on maternal chromosomes
No significant correlation between de novo structural change occurrence and paternal age (limited number of observations?)
Kloosterman et al., Genome Research 2015 Differences in impact of de novo SNVs, indels and SVs
7 x more de novo SNVs per child (av. 45) but: de novo SVs affect ~100 times more genomic DNA (4084 bps)
18 times more genes are hit by de novo SNVs versus SVs
Kloosterman et al., Genome Research 2015 Patients with “neurodevelopmental” disease carry more de novo SNVs How to investigate the ‘pathogenicity’ of de novo and rare SNVs and SVs?
Take advantage of the biobanks participating in BBMRI-NL Systematic analysis potential pathogenic variants to assist clinical interpretation
Low to moderate risk Shared across populations
Higher risk Often more population specific
General population Clinical population
Biobanks allow context dependent interpretation of genetic variation
Downstream effects (RNA, metabolites, methylation, …)
Clinical information
Lifestyle
Family history Unique BBMRI-NL resources: Biobank-based Integrative Omics Study
N = ~4000
Epigenome Environmental and genetic variance change with age
• 33% of methylation sites: significant change in methylation level with age • 10% of methylation sites: significant change of the genetic or environmental variance with age • Most of these (82%): unique environmental variance changes • Prevailing pattern: The higher the age, the larger the total variance in methylation, and the larger the unique environmental variance (E)
There is an age related shift in the causes of variation in DNA methylation between people
28 Example of a rare allele that impacts allele-specific expression (and protein levels)
A deficiency of mannan-binding lectin is associated with susceptibility to infections and with the development of MASP2: c.359A>G immunologic disease (p.Asp120Gly) (NEJM 2003) G MAF: 0.02
Asp Asp Gly A Asp Gly Gly In particular more difficult when moving away from families A 5 year journey to obtain an ultra sharp portrait of the Dutch population
2011 Pilot N=20
De novo SVs 2010 2012 characterized N=250 2015
De novo SNVs SNP 2013 characterized calling complete Variant 2014 Finder online
Data Data Data generation processing analysis GoNLdata can be used in many different ways
Reference for Population medical genetics sequencing
Complex disease genetics (imputation)
Access to data can be requested through nlgenome.nl Imputation of Dutch biobanks with GoNL identifies novel gene variants
Reference for Population medical genetics sequencing
Complex disease genetics (imputation)
Michigan Imputation server A new SNP platform: NL-Axiom array using GONL Imputation backbone / modules UK biobank Axiom array / Psych chip / chromosome X / GWA catalogue
Virtual array GONL reference NL Axiom virtual sequence data Platform SNP Extraction + QC array: 618.889 SNPs • MAF > 0.01 N = 250 unrelated females • Missing Ss > 0.90 • Missing SNPs > 0.95 • HWE > 10-5 Phasing + Imputation Also in 1000G Imputation Quality NL-Axiom • Against 1000G phase 3 1000G SNPs, median R2 1.0 Test imputed genotype 0.8 1.7 concordance with 0.6 8.4 original GONL SNPs 0.4
0.2 Looking at ~10 mil SNPs 0.0 Bad, 0-80% concordance 0.001-0.01 0.01-0.05 0.05-0.50 OK, 80-95% concordance 89.9 Minor Allele Frequency Good, >95% concordance Imputation of Dutch biobanks with GoNL identifies novel gene variants
Reference for Population medical genetics sequencing
Complex disease A rare missense variant genetics The frequency of the in ABCA6 (imputation) ABCA6 variant is 3.65-fold (p.Cys1359Arg), increased in GoNL predicted to be (0.030 vs 0.008 in deleterious 1000GP)
One day, we also may also understand the strange habits of the Dutch …
Using GCTA in a sample of distantly related NL individuals, the measured (and tagged) SNPs explain 25 % of the variance in initiation. Chromosomes 4 and 18, previously linked with cannabis use and other addiction phenotypes, account for the largest amount of variance in initiation. Thanks to the GoNL team Laurent Francioli
Jenny Van Dongen Gertjan Van Ommen Abdel Abdellaoui Dorret Boomsma Eline Slagboom Cisca Paul Wijmenga de Bakker Cornelia Van Duijn Morris Swertz Kai Ye Tomorrow: much more about biobank research in the Netherlands