A Comprehensive Study of Metabolite Genetics Reveals Strong Pleiotropy and Heterogeneity Across Time and Context
Total Page:16
File Type:pdf, Size:1020Kb
ARTICLE https://doi.org/10.1038/s41467-019-12703-7 OPEN A comprehensive study of metabolite genetics reveals strong pleiotropy and heterogeneity across time and context Apolline Gallois1,12, Joel Mefford2,12, Arthur Ko 3, Amaury Vaysse 1, Hanna Julienne1, Mika Ala-Korpela4,5,6,7,8,9, Markku Laakso 10, Noah Zaitlen2,12*, Päivi Pajukanta 3,12*& Hugues Aschard 1,11,12* 1234567890():,; Genetic studies of metabolites have identified thousands of variants, many of which are associated with downstream metabolic and obesogenic disorders. However, these studies have relied on univariate analyses, reducing power and limiting context-specific under- standing. Here we aim to provide an integrated perspective of the genetic basis of meta- bolites by leveraging the Finnish Metabolic Syndrome In Men (METSIM) cohort, a unique genetic resource which contains metabolic measurements, mostly lipids, across distinct time points as well as information on statin usage. We increase effective sample size by an average of two-fold by applying the Covariates for Multi-phenotype Studies (CMS) approach, identifying 588 significant SNP-metabolite associations, including 228 new associations. Our analysis pinpoints a small number of master metabolic regulator genes, balancing the relative proportion of dozens of metabolite levels. We further identify associations to changes in metabolic levels across time as well as genetic interactions with statin at both the master metabolic regulator and genome-wide level. 1 Department of Computational Biology - USR 3756 CNRS, Institut Pasteur, Paris, France. 2 Department of Medicine, University of California, San Francisco, CA, USA. 3 Department of Human Genetics, University of California, Los Angeles, CA, USA. 4 Systems Epidemiology, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia. 5 Computational Medicine, Faculty of Medicine, University of Oulu and Biocenter Oulu, Oulu, Finland. 6 NMR Metabolomics Laboratory, School of Pharmacy, University of Eastern Finland, Kuopio, Finland. 7 Population Health Science, Bristol Medical School, University of Bristol, Bristol, UK. 8 Medical Research Council Integrative Epidemiology Unit at the University of Bristol, Bristol, UK. 9 Department of Epidemiology and Preventive Medicine, School of Public Health and Preventive Medicine, Faculty of Medicine, Nursing and Health Sciences, The Alfred Hospital, Monash University, Melbourne, VIC, Australia. 10 Department of Medicine, University of Eastern Finland and Kuopio University Hospital, Kuopio, Finland. 11 Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA. 12These authors contributed equally: Apolline Gallois, Joel Mefford, Noah Zaitlen, Päivi Pajukanta, Hugues Aschard. *email: [email protected]; [email protected]; [email protected] NATURE COMMUNICATIONS | (2019)10:4788 | https://doi.org/10.1038/s41467-019-12703-7 | www.nature.com/naturecommunications 1 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-12703-7 he human metabolome includes over 100,000 small towards richer understanding of genetic regulation of meta- molecules, ranging from peptides and lipids, to drugs and bolites as a function of environmental factors. T 1 pollutants . Because metabolites affect or are affected by a diverse set of biological processes, lifestyle and environmental Results 2 exposures, and disease states , they are routinely used bio- Powerful genome-wide screening.Wefirst performed GWAS of 3 markers . Thanks to the recent technological advances, diverse the 158 serum metabolites. These measurements consisted of 98 components of the metabolome are being measured in large lipoproteins components (42 very-low-density lipoprotein human cohorts, offering new opportunities to improve our (VLDL), 7 IDL, 21 low-density lipoprotein (LDL) and 28 high- understanding of the molecular mechanisms underlying meta- density lipoprotein (HDL)), 9 amino acids, 16 fatty acids, and 35 4 bolism and corresponding human traits and diseases . For other molecules (Supplementary Data 1 and Supplementary example, previous work has highlighted the role of metabolites in Fig. 1). GWAS was performed using standard linear regression 5,6 diseases such as Type 2 Diabetes , cardiovascular, and heart (STD), but also using the CMS approach21, a powerful method we 7–9 3,10 diseases , and obesity . recently developed for the analysis of multivariate data sets A number of genome-wide association studies (GWAS) of (Online Methods). For both methods, we tested association fi metabolites have also been performed. These studies identi ed between each SNP and each metabolite while adjusting for – 11 hundreds of genetic variant metabolite associations , provided potential confounding factors, including age and medical treat- 12 estimation of the heritability of multiple metabolites , and ments (statins, beta blockers, diuretics, and fibrate). To approx- highlighted the biological and clinical relevance of some of these imate the number of independent associations identified, we fi 13 ndings . All of these studies relied on standard univariate grouped significant SNPs in independent linkage disequilibrium analyses and assessed marginal additive effects of genetic variants (LD) blocks23, denoted further as regions (Online Methods). We only. Despite a number of advantages, univariate approaches have obtained 588 region-metabolite associations involving a total of limitations and the implementation of new integrative approaches 54 independent regions (Supplementary Data 2, 3). Figure 1a are needed to further reconstruct the complex genetic network of shows that these associations are spread over the 158 metabolites: – gene metabolite associations and its dependence on the context we found 399 associations with lipoproteins (189 with VLDL, 38 of each individual. with IDL, 88 with LDL and 84 with HDL), 17 with amino acids, Here, we explore the genetics of 158 serum metabolites 50 with fatty acids, and 122 with other molecules. Among these measured with nuclear magnetic resonance (NMR), including associations, 9 were significant with STD only (1.53%), 261 with mostly serum lipids, in 6263 unrelated men from the Finnish both STD and CMS (44.39%) and 318 (54.08%) with CMS only 14 Metabolic Syndrome In Men (METSIM) cohort. To improve (Supplementary Data 4). Overall, CMS led to a 118% increase in the detection of metabolite-associated variants and infer com- identified signals. Among the 588 region-metabolite associations plex mechanisms underlying the genetics of metabolites, we identified, 228 (involving 45 genes) were not identified at the perform a series of analyses using recently developed multi- same significance level by previous large-scale metabolite variate approaches as well as existing methods. First, unlike studies11,12,19,20,24–27 (Supplementary Data 1 and Supplementary 11,12,15–20 previous metabolite GWAS , we leverage the high Table 6). Note that most of these new associations (78%) involved correlation structure between metabolites to increase the power regions previously identified with total lipids (total cholesterol, of association tests via the Covariates for Multi-phenotype triglyceride, LDL, and HDL)28, but were not further refined into 21 Studies (CMS) method . Second, we produce an integrated specific particles. As illustrated in Fig. 1b, new associations exist – view of the genetic metabolite network, highlighting genes with for 107 of the 158 metabolites. Among these 228 associations, 1 strong pleiotropic effects while showing how integrated analysis was significant with STD only (0.4%), 63 with both STD and CMS of such genes can be leveraged to identify likely causal variants. (27.6%) and 163 (71.5%) with CMS only. For each new associa- Third, we examine variants with effects dependent on statin tion, we further mapped the top SNPs per region to their nearest fi treatment and age, two established modi ers of metabolite gene. Table 1 presents the aggregated results. fi 22 pro les and disease risk, using bivariate heritability and Overall, the CMS approach showed good performance in these interaction analyses. Overall, our analysis provides a step data. First, when comparing single SNPs effect estimates between a 9 CMS only 8 Standard and CMS 7 Standard only 6 5 4 3 All associations 2 1 0 A A A LA Cit IIe PC SM F F F Glc Lac Pyr Glol Ala Gln Gly His Leu Val Phe Tyr Ace Alb Gp EstC ApoB DHA FAw3FAw6 S Crea IDL_PIDL_L IDL_C LDL_D LDL_C FreeC TotPG ApoA1 Tot UnSat PU MUFA LA_FA AcAce IDL_PL IDL_CEIDL_FCIDL_TG HDL_D HDL_C TG_PG TotCho SFA_FA bOHBut L_LDL_PL_LDL_LL_LDL_C S_LDL_PS_LDL_L L_HDL_L VLDL_D Serum_CVLDL_C HDL2_CHDL3_C LDL_TGHDL_TG DHA_FAFAw3_FAFAw6_FA L_VLDL_L L_LDL_PL L_LDL_FCM_LDL_PM_LDL_LM_LDL_C S_LDL_C L_HDL_P L_HDL_C M_HDL_PM_HDL_LM_HDL_C S_HDL_PS_HDL_LS_HDL_C VLDL_TG PUFA_FAMUFA_FA L_VLDL_P L_VLDL_C M_VLDL_PM_VLDL_LM_VLDL_C S_VLDL_PS_VLDL_LS_VLDL_C L_LDL_CEL_LDL_TG M_LDL_PLM_LDL_CEM_LDL_FCM_LDL_TG S_LDL_PLS_LDL_CES_LDL_FCS_LDL_TGXL_HDL_PXL_HDL_LXL_HDL_C L_HDL_PLL_HDL_CEL_HDL_FCL_HDL_TG M_HDL_PLM_HDL_CEM_HDL_FCM_HDL_TG S_HDL_PLS_HDL_CES_HDL_FCS_HDL_TG Serum_TG XL_VLDL_PXL_VLDL_LXL_VLDL_C L_VLDL_PLL_VLDL_CEL_VLDL_FCL_VLDL_TG M_VLDL_PLM_VLDL_CEM_VLDL_FCM_VLDL_TG S_VLDL_PLS_VLDL_CES_VLDL_FCS_VLDL_TGXS_VLDL_PXS_VLDL_LXS_VLDL_C XL_HDL_PLXL_HDL_CEXL_HDL_FCXL_HDL_TG Remnant_C XXL_VLDL_PXXL_VLDL_LXXL_VLDL_C XL_VLDL_PLXL_VLDL_CEXL_VLDL_FCXL_VLDL_TG XS_VLDL_PLXS_VLDL_CEXS_VLDL_FCXS_VLDL_TG ApoB_ApoA1 XXL_VLDL_PLXXL_VLDL_CEXXL_VLDL_FCXXL_VLDL_TG b 0 1 2 3 4 5 6 7 New associations New 8 9 Fig. 1 Region-metabolite