1 Genetic Studies of Urinary Metabolites Illuminate Mechanisms

1 Genetic Studies of Urinary Metabolites Illuminate Mechanisms of Detoxification and 2 Excretion in Humans 3 4 Pascal Schlosser1*, Yong Li1*, Peggy Sekula1*, Johannes Raffler2, Franziska Grundner- 5 Culemann1, Maik Pietzner3,4, Yurong Cheng1, Matthias Wuttke1,5, Inga Steinbrenner1, Ulla T. 6 Schultheiss1,5, Fruzsina Kotsis1,5, Tim Kacprowski4,6,7, Lukas Forer8, Birgit Hausknecht9, Arif B. 7 Ekici10, Matthias Nauck3,4, Uwe Völker4,6, GCKD Investigators**, Gerd Walz5, Peter J. 8 Oefner11, Florian Kronenberg8, Robert P. Mohney12, Michael Köttgen5, Karsten Suhre13, Kai- 9 Uwe Eckardt9,14, Gabriele Kastenmüller2, Anna Köttgen1 10 11 * these authors contributed equally 12 ** a list of the GCKD Investigators is included in Supplementary Information 13 14 15 16 SUPPLEMENTARY INFORMATION 17

18 Table of Contents 19 20 SUPPLEMENTARY NOTE 1: SUPPLEMENTARY RESULTS ...... 3

21 INCORPORATION OF EXISTING BIOLOGICAL KNOWLEDGE INTO CAUSAL GENE ASSIGNMENT ...... 3

22 METABOLITE CLUSTERS PROVIDE BIOLOGICAL CONTEXT FOR YET UNNAMED METABOLITES ...... 4

23 METABOLITE RATIOS CAPTURE INSIGHTS INTO PHYSIOLOGY AND PHARMACOGENETICS ...... 6

24 ASSOCIATION BETWEEN NAT8-ASSOCIATED METABOLITES AND CKD PROGRESSION AND COMPLICATIONS ...... 7

25 SUPPLEMENTARY NOTE 2: EXTENDED ACKNOWLEDGEMENTS ...... 9

26 SUPPLEMENTARY FIGURE 1: OVERVIEW OF THE STUDY DESIGN ...... 10

27 SEPARATE FILE ATTACHED (60 PAGES): SF2_RAP.PDF ...... 11

28 SUPPLEMENTARY FIGURE 2: REGIONAL ASSOCIATION PLOTS FOR MQTLS IDENTIFIED IN MGWAS OF URINARY 29 METABOLITE CONCENTRATIONS ...... 11

30 SUPPLEMENTARY FIGURE 3: COMPARISON OF GENETIC EFFECTS WITH AND WITHOUT ADJUSTMENT FOR EGFR ...... 12

31 SUPPLEMENTARY FIGURE 4: EVALUATION OF GENETIC ASSOCIATIONS OF REPLICATED MQTLS FROM CKD PATIENTS IN 32 A HEALTHY POPULATION SAMPLE ...... 13

33 SUPPLEMENTARY FIGURE 5: CELL TYPE-SPECIFIC EXPRESSION OF ASSOCIATED GENES IN MURINE KIDNEY ...... 14

34 SUPPLEMENTARY FIGURE 6: ASSOCIATION BETWEEN THE INDEX SNP AT SLC7A9 AND PAIR-WISE METABOLITE RATIOS 35 REVEALS TRANSPORTED SUBSTRATES IN VIVO ...... 15

36 SUPPLEMENTARY FIGURE 7: OVERVIEW AND EXAMPLES OF METABOLITE CLUSTERING ...... 16

37 SUPPLEMENTARY FIGURE 8: CIRCULAR PRESENTATION OF GENETIC ASSOCIATIONS WITH EIGENMETABOLITES ...... 18

38 SUPPLEMENTARY FIGURE 9: IDENTIFICATION OF THE UNKNOWN METABOLITE X-13689 AS THE GLUCURONIDE OF 39 ALPHA-CMBHC ...... 19

40 SUPPLEMENTARY FIGURE 10: PRESENCE OF CO-LOCALIZING ASSOCIATION SIGNALS FOR URINARY METABOLITES AND 41 PHENOTYPES AND DISEASES IN THE UK BIOBANK ...... 20

42 REFERENCES:...... 21

44 Supplementary Note 1: Supplementary Results

45 46 Incorporation of existing biological knowledge into causal gene assignment 47 48 The workflow to assign potentially causal genes in GWAS loci was agnostic with respect to

49 existing biological and biochemical knowledge. Upon evaluation after the gene had been

50 assigned, the great majority of automatically assigned genes was also supported by existing

51 biochemical knowledge and experimental studies. There were a few instances, however, in

52 which incorporation of existing biochemical and biological knowledge into the causal gene

53 assignment process would have supported another gene in the locus (see Table). At these

54 loci, it may be that the automated gene assignment that was agnostic with respect to

55 existing biological and biochemical knowledge did not prioritize the correct causal gene. As

56 unbiased databases to facilitate automated gene assignment become more and more

57 complete, for example through the generation of gene expression data in additional tissues

58 and cell types, the assignment of the most likely gene in a given region may be subject to

59 change. Regardless, sensitivity analyses repeating all enrichment analyses with the use of

60 these genes instead of the automatically assigned ones at these eight loci yielded the same

61 or almost identical enriched pathways, tissues, and cell types (data not shown).

Automatically Biologically Associated known Background on biologically supported assigned gene in supported gene in metabolite(s) or gene locus (gene score) locus (gene score) module(s) CASP9 (hremc) AGMAT (hrem) 4-guanidinobutanoate, The enzyme encoded by AGMAT, beta- agmatinase, metabolizes N-(4- guanidinopropanoate aminobutyl)guanidine [http://www.hmdb.ca/] LRP8 (hep) CPT2 (h) methylsuccinoylcarnitine Succinoylcarnitine is a fatty acid. CPT2 encodes carnitine palmitoyltransferase II, which acts on fatty acids. ZKSCAN5 (he) CYP3A7 (NA) 16a-hydroxy DHEA 3- The enzyme encoded by CYP3A7 sulfate, andro steroid hydroxylates dehydroepiandrosterone monosulfate 3-sulphate. C19H28O6S (1)*, tauro- beta-muricholate PPP2R4 (hrem) CRAT (hre) 2- CRAT encodes carnitine O- methylmalonylcarnitine acetyltransferase. This enzyme (C4-DC) converts short- and medium-chain

acyl-CoAs, to which 2- methylmalonylcarnitine belongs. TRIM48 (NA) FOLH1 (NA) N-acetyl-aspartyl- FOLH1 encodes folate hydrolase 1, glutamate (NAAG) which metabolizes NAAG [PMID: 9622670]. The index SNP rs61898064 that gives rise to the automated assignment of TRIM48 is in LD (r2=0.798) with another mQTL (rs55728336). This other mQTL is associated with NAAG, and its index SNP was automatically assigned to FOLH1. The two signals were not merged because the r2 was not >0.8. NUPR1 (he), SULT1A2 (he) 3-hydroxyindolin-2-one The enzyme encoded by SULT1A2 CCDC101 (he) sulfate, furaneol sulfate catalyzes the sulfate conjugation of a wide variety of molecules. TYMS (ohre) ENOSF1 (ohe) ribonate The enzyme encoded by ENOSF1 plays a role in the catabolism of the deoxy sugar L-fucose. Ribonate is a sugar acid. CABP5 (e) SULT2A1 (e) androstenediol The encoded enzyme, (3beta,17beta) disulfate dehydroepiandrosterone (1) sulfotransferase, catalyzes the sulfation of steroids and bile acids including DHEA, of which androstenediol is a direct metabolite. BTN3A1 (hem) SLC17A1/A3/A4 (h) ME41 The metabolites assigned to this module belong to a substrate call transported by the SLC17A transporter family. The individually significant metabolite in the cluster, indolelactate, is assigned to SLC17A1/A3/A4. RAB11FIP5 (h) NAT8 (NA) ME160, ME161, ME166 All known metabolites assigned to the listed clusters are N-acetylated compounds. N-acetylation is a key function of the enzyme encoded by NAT8. 62

63 64 Metabolite clusters provide biological context for yet unnamed metabolites 65 66 Metabolites are intermediates of homeostatic reactions and as such inter-connected beyond

67 pair-wise relationships. Groups of correlated metabolites (“modules”) may reflect shared

68 biochemical pathways or co-regulation. We used a weighted gene co-expression analysis-

69 based approach, to construct 212 metabolite modules (Methods, Supplementary Figure 7A).

70 GWAS of the modules’ first principal component, the eigenmetabolite, identified 46

71 significant (P<2.3e-10 [5e-8/212]) and replicated associations between genetic variants and

72 38 unique metabolite modules (Supplementary Figure 8, Supplementary Table 12). In three

73 instances, the gene scored as most likely to be causal was not part of the 90 genes identified

74 in the single metabolite screen. One of them, CPT2, is also supported by biological evidence

75 but did not receive the highest score within the locus in the single metabolite screen (see

76 above). At the other four genes, biological evidence points toward a different gene than the

77 one automatically assigned (see above, Supplementary Table 12 + 15).

78 Eigenmetabolites that showed particularly strong genetic associations originated

79 from a module of five unknown metabolites (missense rs2147896 in PYROXD2, P=2.5e-917)

80 and a module composed of N2-acetyllysine, N-alpha-acetylornithine, X-12124, X-12125, and

81 X-15666 (missense rs13538 in NAT8, P=4.3e-635; Supplementary Figure 7B, C). Such

82 associations are suggestive of a common function of the enzyme on metabolites in the

83 module, implicating the unknown molecules in the NAT8-associated module as additional N-

84 acetylated compounds or their precursors. Similarly, a module of the known vitamin E

85 (tocopherol)-related metabolites also contained the two unknowns X-13689 and X-24359

86 (Supplementary Table 12) and was associated with rs55744319, which is in high LD with a

87 missense variant in CYP4F2, encoding p.Val433Met. This variant has previously been

88 identified in response to vitamin E supplementation1, vitamin E levels2, and warfarin

89 maintenance dose2,3. Investigation of the unknown metabolites based on their mass,

90 retention time, spectral information and genetic evidence nominated the unknown

91 molecules as structurally related to Vitamin E, with the glucuronide of alpha-CMBHC as a

92 candidate for X-13689. We experimentally verified this prediction through the examination

93 and comparison of retention times from ion chromatograms and the locations and

94 intensities of the MS/MS fragmentation spectra between a standard of the glucuronide of

95 alpha-CMBHC and X-13689 (Supplementary Figure 8). Thus, knowledge of an unknown’s

96 module membership and its genetic association can provide information beyond mass and

97 retention time by restricting the search space of their possible identity for experimental

98 verification.

99 While there were no genetic associations with modules that did not contain at least

100 one metabolite also identified by mGWAS, screening of eigenmetabolites provided the

101 important advantage of permitting a hypothesis-generating screen of higher order genetic

102 associations, whereas already the assessment of all pair-wise metabolite ratios would have

103 accumulated to 686,206 GWAS. Furthermore, 35 of 46 eigenmetabolite associations

104 implicate additional metabolites that were not identified in mGWAS after correction for

105 multiple testing (Supplementary Figure 8, Supplementary Table 12). We have made our

106 software, used for identification of eigenmetabolites, publicly available

107 (https://github.com/genepi-freiburg/Netboost), which may be of particular interest for

108 emerging large-scale integrative Omics efforts.

109

110 Metabolite ratios capture insights into physiology and pharmacogenetics 111 112 The renal tubular amino acid exchanger SLC7A9 is known to exchange dibasic amino acids

113 such as lysine from urine against intracellular neutral amino acids in model systems4,5. We

114 therefore screened all pair-wise metabolite ratios for this known exchanger. A p-gain

115 threshold of 6,728,320 (672,832*10) was used to identify ratios that contributed information

116 beyond their individual components, where 672,832 represents the number of tested ratios

117 (1172*1171/2) after exclusion of ratios with less than 300 measurements. The index SNP

118 rs12460876, associated with differential SLC7A9 expression (Supplementary Table 3), was

119 related to 83 informative metabolite ratios (P-gain >6.7e6, Supplementary Figure 6). Of

120 these, all ratios that contained lysine also contained neutral amino acids such as

121 phenylalanine, threonine, glutamine, or alanine, reflecting its known physiological function,

122 amino acid exchange at the apical membrane of tubular epithelial cells4. In this screen, 5-

123 hydroxy-lysine and the unknown metabolite X-24736 emerged as novel candidate substrates

124 of this exchanger (Supplementary Figure 6). Based on spectral information, mass and

125 retention time, X-24736 is likely an arginine-containing metabolite, consistent with the

126 uptake of dibasic amino acids from urine. The identification of novel candidate substrates of

127 known transport proteins is of high interest for pharmaceutical research, both with respect

128 to a target’s therapeutic potential but also to anticipate potential side effects.

129

130 Association between NAT8-associated metabolites and CKD progression and complications

131 Given the high and near-exclusive expression of NAT8 in kidney, we tested whether the 30

132 NAT8-associated metabolites may carry information about the risk of CKD progression and

133 CKD-related endpoints complementary to eGFR, for example by capturing detoxification

134 capacity. N-acetylation is an important reaction in a major route of detoxification, the

135 generation of water-soluble mercapturic acids6. Spearman correlation coefficients of the 30

136 metabolites with eGFR, the main measure of kidney function in clinical practice, were weak

137 (range -0.17 to 0.24). We assessed the association of the NAT8-associated metabolites with

138 incident end-stage kidney disease (ESKD, n=61), incident major cardiovascular events (MACE,

139 n=143) and all-cause mortality (n=129; Methods). In comparison to a model with clinical

140 information alone, inclusion of metabolites significantly improved the model fit for ESKD

141 (P=1.5e-4, Methods, Supplementary Table 14). While higher urinary concentrations of X-

142 13698 and N-acetylglutamine were protective (HR=0.59 for both), N-acetylkynurenine, N-

143 acetylcitrulline, N-delta-acetylornithine and X-12125 were associated with higher risk (HR

144 range: 1.17-1.47). These metabolites therefore represent potential new biomarkers of ESKD,

145 along with altered N-acetylation capacity as an implicated mechanism, for evaluation in

146 larger studies of CKD progression.

147 Supplementary Note 2: Extended acknowledgements

148 List of GCKD Study Investigators

149 A list of nephrologists currently collaborating with the GCKD study is available at 150 http://www.gckd.org. 151 University of Erlangen-Nürnberg Kai-Uwe Eckardt, Heike Meiselbach, Markus Schneider, Thomas Dienemann, Hans-Ulrich Prokosch, Barbara Bärthlein, Andreas Beck, Thomas Ganslandt, André Reis, Arif B. Ekici, Susanne Avendaño, Dinah Becker-Grosspitsch, Ulrike Alberth- Schmidt, Birgit Hausknecht, Rita Zitzmann, Anke Weigel

University of Freiburg Gerd Walz, Anna Köttgen, Ulla Schultheiß, Fruzsina Kotsis, Simone Meder, Erna Mitsch, Ursula Reinhard

RWTH Aachen University Jürgen Floege, Georg Schlieper, Turgay Saritas, Sabine Ernst, Nicole Beaujean

Charité, University Medicine Berlin Elke Schaeffner, Seema Baid-Agrawal, Kerstin Theisen

Hannover Medical School Hermann Haller, Jan Menne

University of Heidelberg Martin Zeier, Claudia Sommerer, Rebecca Woitke

University of Jena Gunter Wolf, Martin Busch, Rainer Fuß

Ludwig-Maximilians University of München Thomas Sitter, Claudia Blank

University of Würzburg Christoph Wanner, Vera Krane, Antje Börner-Klein, Britta Bauer

Medical University of Innsbruck, Division of Florian Kronenberg, Julia Raschenberger, Barbara Genetic Epidemiology Kollerits, Lukas Forer, Sebastian Schönherr, Hansi Weissensteiner

University of Regensburg, Institute of Peter Oefner, Wolfram Gronwald, Helena Zacharias Functional Genomics Department of Medical Biometry, Informatics Matthias Schmid, Jennifer Nadal and Epidemiology (IMBIE), University of Bonn 152 153

154 Supplementary Figure 1: Overview of the study design

A B

155 156 Schematic representation of the genome-wide screens for single metabolites (A) and 157 eigenmetabolites (B) and their follow up analyses. 158 159

160 Separate file attached (60 pages): SF2_RAP.pdf

161

162 Supplementary Figure 2: Regional association plots for mQTLs identified in mGWAS of 163 urinary metabolite concentrations

164 165 For each of the 240 mQTLs, the region for plotting was selected as the outer borders of 166 merged overlapping 1-Mb windows. The extended MHC region was treated as one region. 167 The index SNP with the lowest p-value is indicated. The metabolite giving rise to the 168 association is included in the title. Linkage disequilibrium information, used to color-code 169 correlation with the index SNP, was calculated from the analyzed subsample of the GCKD 170 study. 171

● 2

●● ● ●

● ● ●●

1 ● ● ●● ●● ●●● ●●●●● ●● ●●● ●●●● ●●● ●● ●●● ●●●● ●●● ●●● ● 0 ●● ●●● ●●● ●●● ●●● ●●● ●●● ●● ●●● ●● ●●● ● ●● ● ● ●● − 1 ●● ● ● ●

residualized for genetic PCs + age sex for residualized ● ●● ● ●

− 2 ●

●

−2 −1 0 1 2

172 Residualized for genetic PCs + age + sex + ln(eGFR) + ln(UACR) 173 Supplementary Figure 3: Comparison of genetic effects with and without adjustment for 174 eGFR

175 Each point represents one of the 240 replicated metabolite-associated mQTLs. Genetic 176 effect size estimates per modeled risk allele including adjustment for eGFR and UACR (X- 177 axis), as done in the main analysis, were plotted against those obtained after adjustment 178 for genetic PCs, age and sex only (Y-axis).

A B 2 2

● ● ● ● 1 1 ● ● ● ●● ● ● ● ●● ● ●● ●● ● ● ●● ● ●● ● ● ●● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ●●● ● ●●● ● ● ● ● ● ●●● ● ●●● ● ● ● ●●● ●●● ● 0 0 ●● ● ●● ● ●●● ●●● ● ●●● ● ●●● ● ●●● ● ● ●●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● 1 1

− ● − ● Effect size in Healthy Population in Healthy size Effect Population in Healthy size Effect

●

● −log10(p−value) = 11 ● −log10(p−value) = 12

2 ● −log10(p−value) = 309 2 ● −log10(p−value) = 309 − −

−2 −1 0 1 2 −2 −1 0 1 2

Effect size in GCKD Effect size in GCKD

C D 2 2

● ● ● ● 1 1 ● ● ● ●● ● ●● ● ●● ● ●● ●● ●● ● ●● ● ●● ● ● ●● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ●●● ● ●●● ● ● ● ● ● ●●● ● ●●● ● ● ● ●●● ●●● ● ● 0 0 ●● ●● ●● ●● ●●● ●●● ● ●●● ● ●●● ● ●●● ● ● ●●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −log10(p−value) = 11 ● ● ● ● ● ● ● −log10(p−value) = 309 ● ● ● ● ● ● ● 1 1 Amino Acid

− ● − ● ● −log10(p−value) = 11 ● Unknown Effect size in Healthy Population in Healthy size Effect Population in Healthy size Effect ● ● −log10(p−value) = 309 ● ● Lipid ● LC/MS Pos Early ● Nucleotide ● LC/MS Neg ● Xenobiotics

2 ● LC/MS Polar 2 ● other − −

−2 −1 0 1 2 −2 −1 0 1 2

Effect size in GCKD Effect size in GCKD

E F 2 2

● ● ● ● 1 1 ● ● ● ●● ● ●● ● ●● ● ●● ●● ●● ● ●● ● ●● ● ● ●● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ●●● ● ●●● ● ● ● ● ● ●●● ● ●●● ● ● ● ●●● ●●● ● ● 0 0 ●● ●● ●● ●● ●●● ●●● ● ●●● ● ●●● ● ●●● ● ● ●●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● 1 −log10(p−value) = 11 1

− ● − ● ● −log10(p−value) = 309 ● −log10(p−value) = 11 Effect size in Healthy Population in Healthy size Effect Population in Healthy size Effect ● ● RSD GCKD − RSD SHIP >0.08 ● ● −log10(p−value) = 309 ● RSD GCKD − RSD SHIP <−0.08 ● 0< % imputed values GCKD <7 ● abs(RSD GCKD − RSD SHIP) <0.08 ● 7< % imputed values GCKD <22

2 ● NA 2 ● 22< % imputed values GCKD − −

−2 −1 0 1 2 −2 −1 0 1 2

Effect size in GCKD Effect size in GCKD

179 Supplementary Figure 4: Evaluation of genetic associations of replicated mQTLs from CKD 180 patients in a healthy population sample

181 (A) Each point represents the index SNP of one of 90 associations that could be matched 182 between the Metabolon platforms of the GCKD and SHIP-trend studies. Dot size is 183 proportional to the -log10(P-value) in GCKD and crosses represent 1.96x standard errors in 184 each study. The red line corresponds to a linear regression based on the effect estimates 185 of the most significant index SNP in each of the 35 unique genetic regions into which the 186 90 associations map. (B) 81 mQTLs with -log10(P-value) > 12 are plotted. In subsequent 187 panels, the color codes correspond to the detection mode of the mass spectrometer (C), 188 metabolite super pathway (D), differences in relative standard deviation based on 189 measurements of duplicate samples as a measure of precision (E), and the percent of 190 imputed values in GCKD (F). For strata with at least 10 matched mQTLs, additional regression 191 lines were added. 192

193 A B murine kidney cells CYP4A11 (Cyp4a32) EC RPS6KA2 (Rps6ka2) Podocyte ACOT2 (Acot2) PT NAT8 (Nat8) CYP2D6 (Cyp2d10) LOH ACY3 (Acy3) DCT ACOX1 (Acox1) CDTrans DHTKD1 (Dhtkd1) PC SLC17A1 (Slc17a1) IC DPEP1 (Dpep1) Fibroblast CYP2C8 (Cyp2j5) Macrophage ACSM2A (Acsm2) Neutrophil SLC7A9 (Slc7a9) B lymphocyte AACS (Aacs) SLC5A9 (Slc5a9) T lymphocyte GGT1 (Ggt1) NK DECR2 (Decr2) novel1 GLDC (Gldc) novel2 PYROXD2 (Pyroxd2) 02468 FMO4 (Fmo4) log (Pvalue) 10 ACOT4 (Acot4) AOC1 (Aoc1) CYP2C8 (Cyp2j7) NAALAD2 (Naalad2) ACOT2 (Acot3) GSTM2 (Gstm7) GOT2 (Got2) ADH1A (Adh1) SULT2A1 (Sult2a3) PAOX (Paox) BST1 (Bst1) HDAC10 (Hdac10) SLC28A2 (Gm14085) ACOT2 (Acot1) FOLH1 (Folh1) CYP4A11 (Cyp4a31) AFMID (Afmid) ACADL (Acadl) IC PT EC PC NK Z-score DCT LOH Trans novel1 novel2 Podocyte Fibroblast CD Neutrophil Macrophage T lymphocyte B lymphocyte 20 2 194 195 Supplementary Figure 5: Cell type-specific expression of associated genes in murine kidney

196 (A): Enrichment testing showed that associated genes are highly expressed in cells of the 197 proximal tubule in mice. The vertical line indicates the statistical significance threshold after 198 Bonferroni adjustment; the arrow indicates p-value <1e-8. (B): Heatmap illustrates the 199 relative expression of each associated genes across the murine kidney cell types; only genes 200 with z-score >2 in at least one cell type are plotted. The mouse gene homologs are provided 201 in parentheses. EC: endothelial cells; PT: proximal tubule; LOH: loop of Henle; DCT: distal 202 convoluted tubule; PC: principal cells; IC: Intercalated cells; CD-Trans: collecting duct 203 transient cells; NK: natural killer cells.

204 205 Supplementary Figure 6: Association between the index SNP at SLC7A9 and pair-wise 206 metabolite ratios reveals transported substrates in vivo

207 The figure uses color-coding to show the strength of associations (test statistics scaled to [- 208 1,1]) between genotype and the 83 ratios that contained information beyond the 209 associations of their individual components (P-gain>6,728,320, Methods), based on 210 association analysis of 672,832 pair-wise metabolite ratios (1172*1171/2, excl. 13,374 ratios 211 with <300 measurements). Test statistics of results that did not confer additional 212 information (P-gain≤6,728,320) are uniformly presented in gray. The metabolite on the Y- 213 axis represents the numerator and on the X-axis the denominator of the respective ratio. 214 Super-pathways: 01 amino acid, 02 carbohydrate, 04 energy, 05 lipid, 06 nucleotide, 08 215 peptide, 09 unknown. Metabolites that are a member of more than four associated 216 metabolite ratios with a scaled test statistic >0.5 (absolute) are marked in bold. The T allele 217 at rs12460876 was associated with higher gene expression, in agreement with greater 218 tubular reuptake of lysine, resulting in lower urinary levels. 219

A Height 0.0 0.2 0.4 0.6 0.8 1.0

B C 0.06

0.03

rs2147896

0.00 AA AG ME193 GG

0.03

0.06

AA AG GG rs2147896

D E 0.10

0.05

rs13538

AA AG ME161 GG 0.00

0.05

AA AG GG rs13538

220 221 Supplementary Figure 7: Overview and examples of metabolite clustering

222 Panel A shows the dendrogram of the metabolite clustering. The band of color indicates 223 membership of each of the 1,172 metabolites in one of 212 clustered metabolite modules. 224 Panel B illustrates module ME193, for which metabolites are labeled. Panel C displays the 16

225 distribution of the eigenmetabolite of ME193 (Y-axis) with genotype at rs2147896 in 226 PYROXD2 (X-axis). Horizontal lines indicate medians. Panel D illustrates module ME161, for 227 which metabolites are labeled. Panel E displays the distribution of the eigenmetabolite of 228 ME161 (Y-axis) with genotype at rs13538 in NAT8 (X-axis). Horizontal lines indicate medians. 229

COMT GGT1

AKR7A2

CPT2 ACADM PLPPR4

22 1 SLC7A9 CYP4F2 21

RAB11FIP5 20 NAT8 060 40 val) 19 2 log10(p

18 ACADL CPS1 20 0 R2

1 1 17 2 1 1 0

1 3 8 1

20 3

No. metab. 1 3 40

15 GBA3

Amino Acid Cofactors and Vitamins 1 Energy Lipid

Nucleotide 4

14 Peptide Xenobiotics Unknown/Partially Characterized

13 6 AGXT2 3 1

1 1 5

1 1 1 1

ACADS 12 1

SLC17A1/A3/A4 BTN3A1 6 SLCO1B1 11

ACY3 NAALAD2 2 10 7 R < 0.1

PAOX CYP3A5 9 8 CYP3A7 0.1 R2 < 0.25

NAT2 PYROXD2 0.25 R2 < 0.5

PPP2R4

GLDC

R2 0.5 230 231 Supplementary Figure 8: Circular presentation of genetic associations with 232 eigenmetabolites

233 The light red band shows the –log10(P-value) for genetic associations with eigenmetabolite 234 concentrations, representing their respective module, by chromosomal position. 235 Associations of all 212 eigenmetabolites are overlaid in the red band, and are capped at 236 P=1e-60. The blue line indicates genome-wide significance (P=2.4e-10). Black gene labels 237 indicate genetic regions in which all members of a given module were also identified in the 238 single metabolite mGWAS, orange labels indicate genetic regions where additional 239 metabolites were implicated as members of a module. The light green band shows the 240 maximum variance in eigenmetabolite levels explained by the index SNP at each genetic 241 region by dark green circles, with the sizes of circles corresponding to different ranges of 242 explained variance. The inner blue band shows a stacked representation of the number of 243 implicated metabolites in each genetic region, is colored according to the super-pathways to 244 which they belong, and the number of modules in the genetic region is given next to it. Color 245 keys of metabolite super-pathways are presented in the middle. 246

247

248 249 Supplementary Figure 9: Identification of the unknown metabolite X-13689 as the 250 glucuronide of alpha-CMBHC

251 The extracted ion chromatograms (upper right) show the same retention time for both the 252 unknown metabolite in a reference urine matrix (“neat urine”) and the candidate molecule 253 in a neat solution (“neat synthetic”). The MS/MS fragmentation spectra of the candidate 254 molecule (lower left) and of the unknown metabolite (lower right) show the same fragments 255 with equal relative intensities; consequently, the candidate molecule is verified. The m/z 256 (observed) for X-13689 is 495.22438, and the m/z (predicted) for alpha-CMBHC glucuronide 257 is 495.22357, representing a 1.6 ppm error. The 319.1921 fragment peak represents the loss 258 of glucuronic acid (a loss of 176), from which a loss of CO2 (-43.9898) yields the 275.201 259 fragment peak.

5e−10 <= p < 5e−8 & H4 >= 0.8 1e−50 < p <= 5e−10 & H4 >= 0.8 p <= 1e−50 & H4 >= 0.8

ALPL rs1772719 AOC1 rs6977081 CABP5 rs2972516 CCDC101 rs151232 CDA rs66731853 COMT rs4680 CPS1 rs1047891 CYP3A5 rs776746 CYP3A7 rs45446698 rs1554597 DDAH1 rs3949301 rs420332 DPEP1 rs2250598 rs2460451 HAL rs3213737 LCT rs12465802 rs1495741 rs35246381 NAT2 rs4921915 rs4921914 rs13408433 rs10168931 rs13431529 rs13410232 rs13384756 rs13538 rs4547554 NAT8 rs10201159 rs111540621 rs6718843 rs6711001 rs6546861 rs7607014 rs10195357 rs231976 NUPR1 rs231977 PPP2R4 rs7849270 rs765285 rs1747522 rs1165151 rs1165152 rs2817188 SLC17A1/A3/A4 rs1165213 rs2762353 rs13197514 rs3757130 rs1892249 rs3799340 SLC22A1 rs12208357 SLC6A13 rs11613331 SLC7A9 rs4805801 rs73069021 rs77289848 rs55695203 SLCO1B1 rs56165099 rs58310495 rs12317268 rs4149056 SUCLG2 rs115560420 rs62129970 SULT2A1 rs62129966 rs34594059 TMPRSS11E rs34103191 rs4694429 UGT2B15 rs1531022 ZKSCAN5 rs34670419

gout Weight M10 Gout Platelet crit Platelet count Sitting height Bread hypertensionintake joint disorder Standing height high cholesterol Neutrophill count Monocyte countEosinophill count Reticulocyte count Lymphocyte count Hip circumference K80 Cholelithiasis Trunk fat−free massBody fat percentage Waist circumference gall bladder disease Ease of skin tanning Trunk fat percentageTrunk predicted mass Basal metabolic rate N20−N23 Urolithiasis Neutrophill percentage Monocyte percentageEosinophill percentage Impedance of leg (left) Lymphocyte percentage Arm fat−free mass (left) Leg fat percentageLeg fat (left)−free mass (left) Body mass indexImpedance (BMI) of arm (left)Impedance of leg (right) Hand grip strength (left) Haematocrit percentage Reticulocyte percentage Platelet distribution widthArm fat−free mass (right) Leg fatLeg−free predicted mass (right) mass (left) Impedance of arm (right) Whole body water mass Hand grip strength (right) Mean Meancorpuscular sphered volume cell volumeMean reticulocyte volume Arm predicted massLeg (left) fat percentage (right) Leg predicted mass (right) Impedance ofWhole whole body body fat−free mass Haemoglobin concentration Arm predicted mass (right) Mean corpuscular haemoglobin Comparative body size at age 10 Red blood cell (erythrocyte) count White blood cell (leukocyte) count Comparative height size at age 10 K82 Other diseases of gallbladderN20 Calculus of kidney and ureter Mean platelet (thrombocyte) volume

Red blood cell (erythrocyte) distribution width

K80−K87 Disorders of gallbladder, biliary tract and pancreas 260 261 Supplementary Figure 10: Presence of co-localizing association signals for urinary 262 metabolites and phenotypes and diseases in the UK Biobank

263 Co-localizing associations (H4≥0.8) that showed associations at genome-wide significance 264 (P<5e-8) with both metabolites and traits and diseases in the UK Biobank were found 265 between 68 traits and 66 of the index SNPs. The strength of the associations based on their 266 association p-values with the UK Biobank trait are indicated by cross or asterisk as described 267 in the legend. The traits are sorted into five groups: blood count-based parameters, 268 anthropometry, life-style, medical conditions, and skin color (top to bottom). SNPs are 269 sorted by gene and within gene by position. 270

271 References:

272 273 1. Major, J.M. et al. Genome-wide association study identifies three common variants 274 associated with serologic response to vitamin E supplementation in men. J Nutr 142, 866-71 275 (2012). 276 2. Major, J.M. et al. Genome-wide association study identifies common variants associated with 277 circulating vitamin E levels. Hum Mol Genet 20, 3876-83 (2011). 278 3. Takeuchi, F. et al. A genome-wide association study confirms VKORC1, CYP2C9, and CYP4F2 279 as principal genetic determinants of warfarin dose. PLoS Genet 5, e1000433 (2009). 280 4. Li, Y. et al. Genome-Wide Association Studies of Metabolites in Patients with CKD Identify 281 Multiple Loci and Illuminate Tubular Transport Mechanisms. J Am Soc Nephrol 29, 1513-1524 282 (2018). 283 5. Bertran, J. et al. Expression cloning of a cDNA from rabbit kidney cortex that induces a single 284 transport system for cystine and dibasic and neutral amino acids. Proc Natl Acad Sci U S A 89, 285 5601-5 (1992). 286 6. Veiga-da-Cunha, M. et al. Molecular identification of NAT8 as the enzyme that acetylates 287 cysteine S-conjugates to mercapturic acids. J Biol Chem 285, 18888-98 (2010).

288