Whole-Genome Sequencing in French Canadians from Quebec

Total Page:16

File Type:pdf, Size:1020Kb

Whole-Genome Sequencing in French Canadians from Quebec

Supplementary Note

Whole-genome sequencing in French Canadians from Quebec

WHOLE-GENOME SEQUENCING PROTOCOL

1. Sequence data generation 1.1. Sequencing As a first experiment, 94 samples were sequenced with a protocol of 2 samples per lane. The remainder of the samples were sequenced at 2.5 samples per lane. Overall, these two protocols resulted in an average coverage of 5.6x (Supplementary Figure 2).

1.2. Alignment and filtering We implemented a bioinformatics pipeline following the best practices recommendations for GATK 1.8. The pipeline combines the following steps for each sample:

 Hardclip the adaptor sequences.  Align raw reads with the Burrows-Wheeler Adapter using the option -M for secondary alignments,  Mark and remove duplicates with Picard.  Realign the sequences locally around indels with GATK's IndelRealigner. The invervals for IndelRealigner were created with GATK's RealignerTargetCreator from 1000G and Mills referenceindel.  Fix mate information with Picard.  Recalibrate base quality score with GATK's BaseRecalibrator.  Calculate the depth of coverage with GATK's DepthOfCoverage (Supplementary Figure 2).

2. Variant calling The following steps were performed across all samples:  Call variant with GATK's UnifiedGenotyper with a call confidence of Q10, recommended for low-pass whole-genome sequencing. Variant calling was performed in parallel on intervals of 5M base pairs and the resulting files were merged with GATK.  Apply the Variant Quality Score Recalibration (VQSR) procedure from GATK to filter spurious variants. It consists of two steps (VariantRecalibrator and ApplyRecalibration) applied sequentially on SNPs and indels. The called variants were compared to training sets HapMap 3.3, Omni2.5M overlapping with 1000G, 1000G phase I indels, dbSNP 137 and Mills and 1000G gold standard indels for the following features: depth of coverage, quality/depth, Fisher test on strand bias, mapping quality rank sum test, rank sum test for relative positioning of reference alleles versus alternative alleles, haplotype score and inbreeding coefficient. We then used a cutoff of 99% sensitivity to remove outliers, considered as spurious calls.  Phase the sequences and impute sporadically missing genotypes with Beagle v4. For each chromosome, the vcf file was split in overlapping intervals of 24,000 variants with Beagle utility "splitvcf". Genotype likelihoods Imputed and phased files were then merged using Beagle utility "mergevcf", which trims the ends of overlapping files and aligns phased overlapping genotypes using the heterozygote genotype nearest the middle of the overlap.  Annotate variants with GATK's variant annotation. Variant novelty was assessed against

1 dbSNP 147.  Control the quality of the sequence data with PLINK. We selected partially independent (r2<0.5) bi-allelic variants with MAF>1% to perform quality-control checks. Variants out of Hardy-Weinberg Equilibrium (p-value < 1.68e-9), were removed. Inbreeding coefficient for each sample and estimate of pairwise IBD for each pair of samples were calculated with PLINK. The distributions of the F values and of the PI_HAT coefficients were plotted with R for visual inspection.

3. Concordance with the ExomeChip data We used PLINK with option diff to calculate the overall concordance between both datasets for bi-allelic SNPs across 1,967 shared individuals. Of ~240,000 variants from the ExomeChip, ~97,000 are polymorphic in the 1,967 samples and ~83,000 are present in the WGS dataset. To build the contingency table, we recoded both files, including monomorphic markers in the ExomeChip, in additive components with PLINK. The contingency table was created with the R function "table".

2 Supplementary Table 1.Descriptive characteristics of the 1,970 individuals in the cohort. We report the percentages of levels for factors and the mean (standard deviation (SD)) for continuous variables.

Characteristic Distribution Sex 1,419 males (72%) Age (years) (mean±SD) 66.4 (9.5) Myocardial infarction (cases/controls) 50% case Statins 1,520 users (77%) HDL-cholesterol (mg/dL) (mean±SD) 49.4 (12.2) LDL-cholesterol (mmol/L) (mean±SD) 2.4 (0.9) Total cholesterol (mmol/L) (mean±SD) 4.1 (1.1) Triglycerides (mg/dL) (mean±SD) 165.1 (85.6)

3 Supplementary Table 2. Contingency table of 83,273 shared bi-allelic variants between the ExomeChip and the whole-genome sequencing datasets in 1,967 individuals. The numbers (0,1, and 2) corresponds to the number of minor allele (additive coding): 0, homozygote for the major allele; 1, heterozygote; 2, homozygote for the minor allele.

Whole-genome sequence 0 1 2 0 139,742,102 79,554 2,188 ExomeChip 1 175,455 19,056,194 41,272 2 519 31,836 4,662,778

Supplementary Table 3. Contingency table of 25,877 shared bi-allelic common variants (MAF > 5%) between the ExomeChip and the whole-genome sequencing datasets in 1,967 individuals. The numbers (0,1, and 2) corresponds to the number of minor allele (additive coding): 0, homozygote for the major allele; 1, heterozygote; 2, homozygote for the minor allele.

Whole-genome sequence 0 1 2 0 28,136,325 58,299 2,157 ExomeChip 1 92,753 17,889,143 39,881 2 333 29,681 4,649,384

Supplementary Table 4. Contingency table of 14,128 shared bi-allelic low-frequency variants (0.5% < MAF ≤ 5%) between the ExomeChip and the whole-genome sequencing datasets in 1,967 individuals. The numbers (0,1, and 2) corresponds to the number of minor allele (additive coding): 0, homozygote for the major allele; 1, heterozygote; 2, homozygote for the minor allele.

Whole-genome sequence 0 1 2 0 26,756,971 12,803 27 ExomeChip 1 46,361 955,747 1,326 2 103 1,772 13,180

Supplementary Table 5. Contingency table of 43,268 shared bi-allelic rare variants (MAF ≤ 0.5%) between the ExomeChip and the whole-genome sequencing datasets in 1,967 individuals. The numbers (0,1, and 2) corresponds to the number of minor allele (additive coding): 0, homozygote for the major allele; 1, heterozygote; 2, homozygote for the minor allele.

Whole-genome sequence 0 1 2 0 84,848,806 8,452 4 ExomeChip 1 36,341 211,304 65 2 83 383 214

4 Supplementary Table 6. Association results with known myocardial infarction SNPs. Associ- ation results between 44 validated MI/CAD variants and MI status in 980 controls and 984 cases. The direction of the effect is evaluated based on the published literature (Beaudoin et al. 2015, CARDIoGRAMplusC4D Consortium 2013; Coronary Artery Disease (C4D) Genetics Consortium 2011; Myocardial Infarction Genetics Consortium 2009; Schunkert H et al. 2011).

Position Effect Other Odds Right direc- SNP Chr. P-value Locus (hg19) allele allele ratio tion rs11206510 1 55496039 C T 0.879962 0.12414 PCSK9 yes rs17114036 1 56962821 G A 0.989284 0.924352 PPAP2B yes rs602633 1 109821511 G T 1.007665 0.929197 SORT1 yes rs4845625 1 154422067 C T 0.98037 0.774867 IL6R yes rs17465637 1 222823529 C A 1.093878 0.218495 MIA3 yes rs16986953 2 19942473 A G 1.006279 0.96267 AK097927 yes rs515135 2 21286057 C T 0.903565 0.230024 APOB no rs6544713 2 44073881 C T 0.982902 0.810287 ABCG5/ABCG8 yes rs1561198 2 85809989 T C 1.05769 0.407355 GGCX/VAMP8 yes rs2252641 2 145801461 C T 1.058626 0.405351 ZEB2- yes AC074093.1 rs6725887 2 203745885 C T 1.245375 0.023999 WDR12 yes 2 rs9818870 3 138122122 T C 1.054038 0.584404 MRAS yes rs7692387 4 156635309 A G 1.020639 0.808259 GUCY1A3 no rs273909 5 131667353 G A 1.005735 0.958063 SLC22A4/SLC2 yes 2A5 rs12526453 6 12927544 G C 0.734291 1.64E-05 PHACTR1 yes rs17609940 6 35034800 C G 0.931894 0.423021 ANKS1A yes rs10947789 6 39174922 C T 0.913203 0.241793 KCNK5 yes rs12190287 6 134214525 G C 0.905416 0.172732 TCF21 yes rs3798220 6 160961137 C T 1.106651 0.639108 LPA yes rs4252120 6 161143608 C T 1.001762 0.981448 PLG no rs2023938 7 19036775 C T 0.95293 0.657919 HDAC9 no rs10953541 7 107244545 T C 0.86059 0.052293 7q22 yes 7 rs11556924 7 129663496 T C 0.973526 0.6946 ZC3HC1 yes rs264 8 19813180 A G 1.03218 0.74269 LPL no rs2954029 8 126490972 T A 0.953663 0.478418 TRIB1 yes rs4977574 9 22098574 G A 1.221609 0.003418 CDKN2A/CDK yes 6 N2B rs579459 9 136154168 C T 1.086602 0.286566 ABO yes rs2505083 10 30335122 C T 0.982907 0.801063 KIAA1462 no rs1746048 10 44775824 T C 1.108604 0.280488 CXCL12 no rs1412444 10 91002927 T C 0.9444 0.43045 LIPA no rs12413409 10 104719096 A G 0.984778 0.900468 CYP17A1 yes rs974819 11 103660567 C T 0.789305 0.000866 PDGFD yes 5 rs964184 11 116648917 C G 0.925674 0.422174 APOA5 yes

5 rs3184504 12 111884608 C T 0.882783 0.066751 SH2B3 yes 9 rs9319428 13 28973621 A G 1.029834 0.693763 FLT1 yes rs4773144 13 110960712 G A 1.055273 0.428492 COL4A1 yes rs2895811 14 100133942 C T 0.948926 0.441995 HHIPL1 no rs3825807 15 79089111 G A 0.924879 0.250282 ADAMTS7 yes rs17514846 15 91416550 A C 1.03743 0.594192 FURIN/FES yes rs216172 17 2126504 C G 1.143285 0.054654 SMG6 yes 2 rs12936587 17 17543722 A G 0.969528 0.645794 RASD1 yes rs46522 17 46988597 T C 1.214446 0.004097 UBE2Z yes 9 rs2075650 19 45395619 G A 1.091383 0.414265 APOE/APOC1 yes rs9982601 21 35599128 T C 1.073769 0.463521 Gene yes desert/KCNE2

6 Supplementary Table 7. Association results with high-density lipoprotein (HDL)-cholesterol levels. Association results between 69 validated HDL-C variants and NMR-measured HDL-C levels in 1961 individuals. The direction of the effect is evaluated based on the published literature (Global Lipids Genetics et al. 2013).

Position Effect al- Other al- Right dir- SNP Chr. Beta Pvalue Locus (hg19) lele lele ection rs1274815 1 27138393 T C -0.02416 0.670583 PIGV-N- yes 2 R0B2 rs4660293 1 40028180 G A -0.04013 0.256613 PABPC4 yes rs1214574 1 15670065 G T 0.006448 0.83353 HDGF-PM- yes 3 1 VK rs4650994 1 17851531 A G -0.02028 0.482854 ANGPTL1 yes 2 rs1689800 1 18216888 G A -0.00135 0.965083 ZNF648 yes 5 rs4846914 1 23029569 A G 0.034592 0.252434 GALNT2 yes 1 rs1232867 2 16554080 C T -0.05692 0.220916 COBLL1 no 5 0 rs1047891 2 21154050 A C -0.04211 0.210325 CPS1 yes 7 rs2972146 2 22710069 T G -0.02354 0.444504 IRS1 yes 8 rs2606736 3 11400249 T C 0.017324 0.567889 ATG7 no rs2290547 3 47061183 A G -0.07392 0.043646 SETD2 yes rs2013208 3 50129399 T C 0.002127 0.942932 RBM5 yes rs1332616 3 52532118 G A -0.07417 0.039046 STAB1 yes 5 rs6805251 3 11956060 C T -0.02737 0.370723 GSK3B yes 6 rs1740415 3 13216320 T G -0.03184 0.4918 ACAD11 no 3 0 rs1001988 4 26062990 G A -0.02408 0.536577 C4orf52 yes 8 rs3822072 4 89741269 A G -0.05142 0.081569 FAM13A yes rs2602836 4 10001480 G A 0.01976 0.508839 ADH5 no 5 rs1310732 4 10318870 T C -0.06399 0.254949 SLC39A8 yes 5 9 rs6450176 5 53298025 A G -0.02463 0.472445 ARL15 yes rs998584 6 43757896 A C -0.04559 0.133252 VEGFA yes rs1936800 6 12743606 T C 0.0026 0.930297 RSPO3 no 4 rs605066 6 13982966 T C 0.014442 0.634064 CITED2 yes 6 rs702485 7 6449272 G A 0.025974 0.38521 DAGLB yes rs4142995 7 17919258 T G 0.032802 0.272891 SNX13 no

7 rs4917014 7 50305863 G T 0.053863 0.08732 IKZF1 yes rs1714573 7 72982874 T C -0.01847 0.703558 MLXIPL no 8 rs4731702 7 13043338 T C 0.031569 0.290598 KLF14 yes 4 rs1717363 7 15052944 C T 0.063036 0.284621 TMEMI76 no 7 9 A rs9987289 8 9183358 G A 0.140125 0.014275 PPP1R3B yes rs1267891 8 19844222 G A 0.166796 0.001312 LPL yes 9 rs2293889 8 11659919 G T 0.012722 0.66846 TRPS1 yes 9 rs2954029 8 12649097 T A 0.040291 0.170997 TRIB1 yes 2 rs581080 9 15305378 C G 0.032137 0.391772 TTC39B yes rs1883025 9 10766430 T C -0.10398 0.002289 ABCA1 yes 1 rs970548 10 46013277 C A -0.00177 0.958983 MARCH8- no ALOX5 rs2923084 11 10388782 G A -0.06214 0.095548 AMPD3 yes rs3136441 11 46743247 C T -0.02602 0.620435 LRP4 no rs1124660 11 51512090 C T -0.01026 0.934157 OR4C46 no 2 rs174546 11 61569830 T C -0.06469 0.042778 FADS1-2-3 yes rs1280163 11 65391317 A G -0.01852 0.61353 KAT5 no 6 rs499974 11 75455021 A C -0.06306 0.138125 MOGAT2- yes DGAT2 rs964184 11 11664891 C G 0.007633 0.857334 APOA1 yes 7 rs7941030 11 12252237 C T -0.01228 0.687431 UBASH3B no 5 rs1161335 12 57792580 T C 0.012389 0.735478 LRP1 yes 2 rs7134594 12 11000019 T C 0.061651 0.036935 MVK yes 3 rs4765127 12 12446016 T G 0.042604 0.175731 ZNF664 yes 7 rs838880 12 12526159 T C -0.062 0.051738 SCARB1 yes 3 rs4983559 14 10527720 A G -0.03272 0.288258 ZBTB42- yes 9 AKT1 rs1532085 15 58683366 G A -0.12163 7.13E-05 LIPC yes rs2652834 15 63396867 G A -0.02413 0.521108 LACTB no rs1121980 16 53809247 A G -0.02851 0.341136 FTO yes rs3764261 16 56993324 A C 0.169506 7.06E-08 CETP yes rs1694288 16 67928042 A G 0.057233 0.198362 LCAT yes 7

8 rs2925979 16 81534790 C T 0.02754 0.415632 CMIP yes rs1186928 17 37813856 C G 0.082138 0.007066 STARD3 yes 6 rs4148008 17 66875294 G C -0.00399 0.9016 ABCA8 yes rs4129767 17 76403984 A G 0.035739 0.225678 PGS1 yes rs7241918 18 47160953 T G 0.098243 0.015866 LIPG yes rs1296713 18 57849023 A G 0.030025 0.390976 MC4R no 5 rs7255436 19 8433196 A C 0.032359 0.273137 ANGPTL4 yes rs737337 19 11347493 C T -0.11168 0.053246 ANGPTL8 yes rs731839 19 33899065 A G -0.00395 0.897622 PEPD no rs4420638 19 45422946 G A -0.00563 0.89201 APOE yes rs1769522 19 52324216 A G 0.028195 0.388063 HAS1 no 4 rs386000 19 54792761 C G -0.00211 0.957179 LILRA3 no rs1800961 20 43042364 T C -0.04699 0.595884 HNF4A yes rs6065906 20 44554015 C T -0.0611 0.091433 PLTP yes rs181362 22 21932068 T C -0.06322 0.079654 UBE2L3 yes

9 Supplementary Table 8. Association results with low-density lipoprotein (LDL)-cholesterol levels. Association results between 57 validated LDL-C variants and biochemistry-measured LDL- C levels in 1957 individuals. The direction of the effect is evaluated based on the published liter- ature (Global Lipids Genetics et al. 2013).

SNP Chr. Position Effect Other Beta Pvalue Locus Rig (hg19) allele allele ht dir- ec- tio n rs1202713 1 25775733 T A 0.035073 0.193924 LDLRAP1 yes 5 rs1274815 1 27138393 T C -0.01962 0.700688 PIGV-N- no 2 R0B2 rs2479409 1 55504650 A G 0.036365 0.193042 PCSK9 no rs2131925 1 63025942 T G 0.091285 0.001654 ANGPTL3 yes rs629301 1 109818306 T G 0.093314 0.005771 SORT1 yes rs267733 1 150958836 G A 0.005117 0.886032 ANXA9- no CERS2 rs2642442 1 220973563 T C 0.010813 0.701792 MOSC1 yes rs514230 1 234858597 T A 0.02447 0.357568 IRF2BP2 yes rs1367117 2 21263900 A G 0.022181 0.444408 APOB yes rs4299376 2 44072576 T G -0.02896 0.311142 ABCG5/8 yes rs2710642 2 63149557 A G 0.015678 0.58193 EHBP1 yes rs1049062 2 118835841 A G -0.06384 0.192931 INSIG2 yes 6 rs2030746 2 121309488 T C 0.008852 0.744818 LOC84931 yes rs1250229 2 216304384 C T 0.011335 0.704707 FN1 yes rs1156325 2 234679384 T C 0.058716 0.179134 UGT1A6 yes 1 rs7640978 3 32533010 T C 0.012622 0.775395 CMTM6 no rs1740415 3 132163200 T G 0.019817 0.634114 ACAD11 no 3 rs6831256 4 3473139 G A -0.03407 0.202748 LRPAP1 yes rs12916 5 74656539 C T 0.02773 0.311215 HMGCR yes rs4530754 5 122855416 A G 0.022179 0.413245 CSNK1G3 yes rs6882076 5 156390297 C T 0.04631 0.095788 TIMD4 yes rs3757354 6 16127407 T C 0.042243 0.204185 MYLIP no rs1800562 6 26093141 A G -0.05909 0.329224 HFE yes rs3177928 6 32412435 A G -0.00161 0.967566 HLA no rs9488822 6 116312893 T A -0.00234 0.932696 FRK no rs1564348 6 160578860 C T 0.048283 0.206563 LPA yes rs1267079 7 21607352 C T 0.014769 0.644295 DNAH11 yes 8 rs4722551 7 25991826 C T 0.028701 0.42456 MIR148A yes rs2072183 7 44579180 C G 0.095125 0.004249 NPC1L1 yes

10 rs9987289 8 9183358 G A 0.01453 0.778717 PPP1R3B yes rs1010216 8 55421614 A G 0.072915 0.034794 SOX17 yes 4 rs2081687 8 59388565 C T -0.02553 0.358707 CYP7A1 yes rs2954029 8 126490972 T A -0.03644 0.169176 TRIB1 yes rs3780181 9 2640759 G A -0.14245 0.006394 VLDLR yes rs635634 9 136155000 T C -0.01972 0.541278 ABO no rs2255141 10 113933886 G A -0.01872 0.524445 GPAM yes rs174546 11 61569830 T C 0.037369 0.195126 FADS1-2-3 no rs964184 11 116648917 C G -0.17013 8.06E-06 APOA1 yes rs1122046 11 126243952 A G 0.001822 0.963061 ST3GAL4 yes 2 rs1106598 12 112072424 G A -0.03931 0.127323 BRAP yes 7 rs1169288 12 121416650 C A -1.92E- 0.999948 HNF1A no 06 rs4942486 13 32953388 C T -0.00636 0.812111 BRCA2 yes rs8017377 14 24883887 A G 0.010361 0.699869 NYNRIN yes rs3764261 16 56993324 A C 0.03504 0.218775 CETP no rs2000999 16 72108093 A G 0.072762 0.029333 HPR yes rs314253 17 7091650 C T 0.008751 0.758338 DLG4 no rs7206971 17 45425115 A G -0.01287 0.627629 OSBPL7 no rs1801689 17 64210580 C A 0.115551 0.104573 APOH-PRX- yes CA rs6511720 19 11202306 T G -0.11257 0.005985 LDLR yes rs1040196 19 19407718 C T -0.02603 0.635295 CILP2 yes 9 rs4420638 19 45422946 G A 0.166647 7.47E-06 APOE yes rs364585 20 12962718 G A 0.00193 0.945048 SPTLC3 yes rs2328223 20 17845921 C A 0.045354 0.160204 SNX5 yes rs2902940 20 39091487 G A -0.04814 0.097662 MAFB yes rs6029526 20 39672618 A T 0.005881 0.825864 TOP1 yes rs5763662 22 30378703 T C 0.17693 0.052053 MTMR3 yes rs4253772 22 46627603 T C 0.009404 0.817868 PPARA no

11 Supplementary Table 9. Association results with total cholesterol (TC) levels. Association res- ults between 73 validated TC variants and biochemistry-measured TC levels in 1955 individuals. The direction of the effect is evaluated based on the published literature (Global Lipids Genetics et al. 2013).

SNP Chr. Position Effect Other Beta Pvalue Locus Right (hg19) allele allele direc- tion rs1077514 1 23766233 T C 0.074126 0.04589 ASAP3 yes rs12027135 1 25775733 T A 0.032674 0.216569 LDLRAP1 yes rs2479409 1 55504650 A G 0.023896 0.382341 PCSK9 no rs2131925 1 63025942 T G 0.083607 0.00325 ANGPTL3 yes rs7515577 1 93009438 A C 0.052912 0.103273 EVI5 yes rs629301 1 109818306 T G 0.096953 0.003377 SORT1 yes rs2642442 1 220973563 T C 0.02369 0.391214 MOSC1 yes rs514230 1 234858597 T A 0.022313 0.391491 IRF2BP2 yes rs1367117 2 21263900 A G 0.029167 0.30424 APOB yes rs1260326 2 27730940 C T -0.0842 0.001442 GCKR yes rs4299376 2 44072576 T G -0.02077 0.458032 ABCG5/8 yes rs10490626 2 118835841 A G -0.04531 0.345126 INSIG2 no rs2030746 2 121309488 T C 0.029887 0.261455 LOC84931 yes rs7570971 2 135837906 A C 0.021776 0.39747 RAB3GAP1 yes rs2287623 2 169830155 A G -0.029 0.279874 ABCB11 yes rs11694172 2 203532304 G A -0.01894 0.535104 FAM117B no rs11563251 2 234679384 T C 0.03381 0.42937 UGT1A6 yes rs2290159 3 12628920 C G -0.00012 0.996906 RAF1 yes rs7640978 3 32533010 T C -0.01236 0.775296 CMTM6 yes rs13315871 3 58381287 A G -0.05713 0.209438 PXK yes rs6831256 4 3473139 G A -0.02658 0.310178 LRPAP1 yes rs12916 5 74656539 C T 0.010426 0.697366 HMGCR yes rs4530754 5 122855416 A G -0.0022 0.934002 CSNK1G3 no rs6882076 5 156390297 C T 0.044012 0.106003 TIMD4 yes rs3757354 6 16127407 T C 0.02123 0.514639 MYLIP no rs1800562 6 26093141 A G -0.06022 0.309524 HFE yes rs3177928 6 32412435 A G -0.04001 0.300304 HLA no rs2814982 6 34546560 T C -0.02187 0.634887 C6orf106 yes rs2758886 6 39250837 A G 0.039222 0.162793 KCNK17 yes rs9488822 6 116312893 T A -0.02017 0.456334 FRK no rs9376090 6 135411228 C T -0.00436 0.889294 HBS1L no rs1564348 6 160578860 C T 0.039363 0.292733 LPA yes rs1997243 7 1083777 G A -0.03348 0.356002 GPR146 no rs12670798 7 21607352 C T 0.017019 0.586735 DNAH11 yes rs4722551 7 25991826 C T 0.022262 0.527006 MIR148A yes rs2072183 7 44579180 C G 0.077221 0.017798 NPC1L1 yes rs9987289 8 9183358 G A 0.074126 0.142747 PPP1R3B yes

12 rs1495741 8 18272881 A G -0.01057 0.732662 NAT2 yes rs10102164 8 55421614 A G 0.062094 0.066337 SOX17 yes rs2081687 8 59388565 C T -0.02545 0.34999 CYP7A1 yes rs2954029 8 126490972 T A -0.03163 0.222923 TRIB1 yes rs3780181 9 2640759 G A -0.16496 0.00124 VLDLR yes rs581080 9 15305378 C G 0.021858 0.508715 TTC39B yes rs1883025 9 107664301 T C -0.00251 0.933504 ABCA1 yes rs635634 9 136155000 T C 0.016921 0.592389 ABO yes rs10904908 10 17260290 G A 0.028035 0.290006 VIM-CUBN yes rs970548 10 46013277 C A 0.048985 0.104915 MARCH8-ALOX5 no rs2255141 10 113933886 G A -0.03691 0.19976 GPAM yes rs10128711 11 18632984 C T 0.058775 0.051078 SPTY2D1 yes rs174546 11 61569830 T C 0.009153 0.74587 FADS1-2-3 no rs964184 11 116648917 C G -0.18696 5.27E-07 APOA1 yes rs11603023 11 118486067 C T 0.013701 0.599228 PHLDB1 no rs7941030 11 122522375 C T -0.04354 0.105523 UBASH3B no rs11220462 11 126243952 A G 0.001531 0.968286 ST3GAL4 yes rs4883201 12 9082581 G A 0.023764 0.621757 PHC1-A2ML1 no rs11065987 12 112072424 G A -0.01166 0.64427 BRAP yes rs1169288 12 121416650 C A -0.00331 0.908606 HNF1A no rs1532085 15 58683366 G A -0.065 0.016304 LIPC yes rs3764261 16 56993324 A C 0.095832 0.000572 CETP yes rs2000999 16 72108093 A G 0.085424 0.00893 HPR yes rs314253 17 7091650 C T 0.02843 0.306988 DLG4 no rs7206971 17 45425115 A G -0.01956 0.451474 OSBPL7 no rs7241918 18 47160953 T G 0.025905 0.470872 LIPG yes rs6511720 19 11202306 T G -0.05557 0.166487 LDLR yes rs10401969 19 19407718 C T -0.05537 0.302414 CILP2 yes rs4420638 19 45422946 G A 0.130521 0.000347 APOE yes rs492602 19 49206417 G A 0.007113 0.782944 FLJ36070 yes rs2277862 20 34152782 T C -0.0324 0.330694 ERGIC3 yes rs2902940 20 39091487 G A -0.02792 0.326568 MAFB yes rs6029526 20 39672618 A T 0.004974 0.84926 TOP1 yes rs1800961 20 43042364 T C -0.04652 0.552185 HNF4A yes rs138777 22 35711098 G A -0.0203 0.468936 TOM1 yes rs4253772 22 46627603 T C 0.031894 0.424757 PPARA yes

13 Supplementary Table 10. Association results with triglycerides (TG) levels. Association results between 37 validated triglyceride variants and NMR-measured triglyceride levels in 1961 samples.The direction of the effect is evaluated based on the published literature (Global Lipids Genetics et al. 2013).

Effect al- Other Right SNP Chr. Position Beta Pvalue Locus lele allele direction rs1274815 1 27138393 T C -0.0434 0.470668 PIGV-N- no 2 R0B2 rs2131925 1 63025942 T G 0.071416 0.036756 ANGPTL3 yes rs4846914 1 230295691 A G -0.04062 0.204606 GALNT2 yes rs1260326 2 27730940 C T -0.10414 0.001051 GCKR yes rs2972146 2 227100698 T G 0.060159 0.064945 IRS1 yes rs645040 3 135926622 T G -0.02736 0.454369 MSL2L1 no rs6831256 4 3473139 G A 0.023825 0.44946 LRPAP1 yes rs442177 4 88030261 T G 0.056562 0.083747 KLHL8 yes rs9686661 5 55861786 T C -0.05709 0.141972 MAP3K1 no rs6882076 5 156390297 C T 0.069604 0.033594 TIMD4 yes rs998584 6 43757896 A C 0.061482 0.055914 VEGFA yes rs1936800 6 127436064 T C -0.01007 0.749184 RSPO3 no rs4722551 7 25991826 C T -0.03775 0.373866 MIR148A no rs1323820 7 72129667 T C -0.17099 0.102716 TYW1B yes 3 rs1714573 7 72982874 T C -0.0182 0.723533 MLXIPL yes 8 rs38855 7 116358044 G A 0.002151 0.945454 MET no rs1177676 8 10683929 C G 0.012701 0.71097 PINX1 yes 7 rs1495741 8 18272881 A G -0.11015 0.003013 NAT2 yes rs1267891 8 19844222 G A -0.16405 0.002865 LPL yes 9 rs2954029 8 126490972 T A -0.09029 0.003727 TRIB1 yes rs1832007 10 5254847 G A 0.026312 0.566036 AKR1C4 no rs1076173 10 65027610 T A 0.029439 0.348313 JMJD1C no 1 rs2068888 10 94839642 A G -0.01652 0.604931 CYP26A1 yes rs174546 11 61569830 T C 0.083389 0.013667 FADS1-2- yes 3 rs964184 11 116648917 C G -0.30381 8.29E-12 APOA1 yes rs1161335 12 57792580 T C 0.037324 0.336606 LRP1 no 2 rs4765127 12 124460167 T G -0.00287 0.931403 ZNF664 yes rs2412710 15 42683787 A G -0.06225 0.462594 CAPN3 no rs2929282 15 44245931 T A 0.01633 0.824583 FRMD5 yes rs1532085 15 58683366 G A 0.068357 0.035655 LIPC no rs3198697 16 15129940 T C -0.03093 0.338272 PDXDC1 yes

14 rs1164965 16 30918487 G C -0.02226 0.492746 CTF1 yes 3 rs1121980 16 53809247 A G 0.037818 0.23332 FTO no rs3764261 16 56993324 A C 0.0082 0.807008 CETP no rs8077889 17 41878166 C A 0.061035 0.11126 MPP3 yes rs7248104 19 7224431 A G -0.02805 0.378952 INSR yes rs1040196 19 19407718 C T -0.11757 0.067595 CILP2 yes 9 rs731839 19 33899065 A G -0.03194 0.325506 PEPD yes rs6065906 20 44554015 C T -0.0396 0.30203 PLTP no rs5756931 22 38546033 C T -0.00032 0.992211 PLA2G6 yes

15 Supplementary Table 11. Number of individuals with grandparents from outside Quebec, four regions of Quebec and the Maritime provinces. For instance, the dataset includes 24 individuals with four grandparents that were born in the Saguenay-Lac-St-Jean Northeast region of Quebec. Similarly, there are 1,751 individuals with no grandparent born outside of Quebec.

Number of grandparents from the specific regions Regions 0 1 2 3 4 Outside Quebec 1,751 116 69 6 28 Saguenay 1,889 18 34 5 24 Gaspesia 1,864 26 36 10 34 Quebec City 1,726 88 86 23 47 Montreal 1,426 189 198 63 94 New Brunswick 1,927 9 19 1 14 Nova Scotia 1,965 2 2 1 0 Prince Edward 1,968 2 0 0 0 Island

16 Supplementary Figure 1.Map of administrative regions in Quebec. The regions include Saguenay-Lac-St-Jean (2), Quebec City (3), Montreal (6) and Gaspesia (11). Source: http://www.electionsquebec.qc.ca/francais/provincial/carte-electorale/cartes-des- circonscriptions-electorales-par-region-2011.php.

17 Supplementary Figure 2.Distribution of the average depth of coverage among 1,970 individuals sequenced. The mean coverage is 5.6x.

18 Supplementary Figure 3. Boxplots and means of the estimated imputation accuracy. For each of three haplotype reference panels, we report the imputation quality for ~420K rare (MAF ≤ 0.5%), ~250K low-frequency (0.5% < MAF ≤ 5%), ~220K common (MAF > 5%), and all ~890K shared vari- ants on chromosome 1. The number on each box corresponds to the mean Rsq quality score. 1000G, 5,008 haplotypes from phase 3 of the 1000 Genomes Project; FC, 3,940 haplotypes from this whole- genome DNA re-sequencing project in French Canadians; combined, combination of the 1000G and FC haplotypes.

19 Supplementary Figure 4. Boxplots and means of the squared correlation between masked geno- types and imputed dosages. For each of the haplotype reference panels, we report the imputation qual- ity for ~10K rare (MAF ≤ 0.5%), ~20K low-frequency (0.5% < MAF ≤ 5%), ~45K common (MAF > 5%), and all ~75K shared non monomorphic variants on chromosome 1. The number on each box cor- responds to the mean EmpRsq quality score. FC, 3,940 haplotypes from this whole-genome DNA re- sequencing project in French Canadians; HRC, 64,976 haplotypes from the Haplotype Reference Con- sortium.

20 Supplementary Figure 5. Boxplots and means of the estimated imputation accuracy. For each of three haplotype reference panels, we report the imputation quality for ~380K rare (MAF ≤ 0.5%), ~230K low-frequency (0.5% < MAF ≤ 5%), ~200K common (MAF > 5%), and all ~810K shared vari- ants on chromosome 1. The number on each box corresponds to the mean Rsq quality score. FC, 3,940 haplotypes from this whole-genome DNA re-sequencing project in French Canadians; HRC HRC, 64,976 haplotypes from the Haplotype Reference Consortium.

21 Supplementary Figure 6. Projection of 1,970 individuals on C1 and C2 from MDS. Individuals are color-coded according to the number of their grand-parents from Gaspesia (A), New-Brunswick (B), Nova-Scotia (C), and Prince Edward Island (D).

22 Supplementary Figure 7. Projection on C1 and C2 from MDS of 199 individuals whose four grandparents are from the same region (Montreal, Quebec, Saguenay or Gaspesia).

23 Supplementary Figure 8. Projection of 1,970 French-Canadian individuals and 2,504 individuals from the 1000 Genomes Project on C1 and C2 from MDS. Individuals are color-coded according to their ancestry.

24 Supplementary Figure 9. Projection of 1,970 French-Canadian individuals and 503 individuals from the European populations of the 1000 Genomes Project on C1 and C2 from MDS. Individuals are color-coded according to their ancestry.

25

Recommended publications