SUPPLEMENTARY TEXT

FIGURES GlycA:complexbiomarker Unknownbiology

Humancohortsproled DILGOM YFS FINRISK97 N=518 N=7,599 2001 2007 2011 N=2,245 N=2,159 N=2,046

Testestablishedhypotheses "InammaonunderliesGlycA" GlycA GlycA GlycA vs. GlycA vs. vs. Acute-phase overme Cytokines Infecon glycoproteins

Hypothesisfreediscoveryofnewbiology coexpressionnetworks GlycA vs. Replicaon Coexpression networks

Idenfyfuncon GOtermenrichment, Literaturecuraon

TheDegranulaonModule(NDM)

Testpredictedbiology GlycA NDMexpression NDMexpression vs. vs. vs. NDMexpression, Leukocytecount Infecon Leukocytecount

Testnewhypotheses Chronicoveracveimmuneresponsepredictsriskof hospitalisaonanddeathfrominfeconrelatedevents GlycA vs. Electronichealthrecords

Figure S1: Overall study design. Boxes denote key associations tested in the paper. Box colour corresponds to cohort the association was tested in.

1

2001 2007 2011 Difference: 0.44 Difference: 0.40 Difference: 0.38 95% CI: 0.24– 0.65 95% CI: 0.22– 0.59 95% CI: 0.17– 0.58 P-value: 3 x 10-5 P-value: 4 x 10-5 P-value: 3 x 10-4 4 4 4

2 2 2

0 0 0 standardised GlycA standardised

2 2 2

No infection Infection No infection Infection No infection Infection (N = 2,077) (N = 112) (N = 2,024) (N = 95) (N = 1,906) (N = 93)

Figure S2: Box plots of GlycA comparing participants reporting febrile infection in the two weeks prior to blood sampling to those reporting no febrile infection for each adulthood survey of the YFS cohort. Reported differences indicate the mean elevation of GlycA in SD-units for those reporting febrile infection compared to those reporting no febrile infection. P-values are from t-tests.

2 a) DILGOM

b) YFS2011

c)ReplicaonofGlycAassociaon Magnitude 95%CI P-value DILGOM -0.14 -0.21–-0.071 1x10-4 -4 YFS2011 -0.072 -0.11–-0.036 1x10

Figure S3: Replication of Module B. Probe coexpression (Spearman correlation) in the DILGOM cohort (a) and replication in the YFS2011 cohort (b). c) Replication of GlycA association in both cohorts. Associations were assessed using a linear regression of GlycA on the module summary expression profile adjusting for age, sex, and triglyceride levels. Magnitude denotes difference in SD-units of log-transformed GlycA per SD increase of Module B summary expression in the respective cohort. GlycA and triglyceride levels were log transformed. All continuous measurements were standardised.

3

2.7

Fungalinfecons [B35-B49] 2.2 Inuenza[J10-J11] Sepcaemia[A40-A41]

Boneandjointinfecons [M00-M03,M86] 1.8 Lowerrespiratoryinfecons[J20-J22]

Localizedskininfecons Sepcaemia[A40-A41]

SD log(GlycA) SD

[L00-L08] Cardiacinfecons[I30,I32,I33,I40,I41] 1.5 Intesnalinfecons[A00-A09] Pneumonia[J12-J18] Pneumonia[J12-J18] Acuterespiratoryinfecons[J00-J06]

Viralfevers[A90-A99] 1.2 Urinarysysteminfecons[N10,N11,N30,N34,N39] Centralnervoussystem Urinarysysteminfecons[N10,N11,N30,N34,N39] infecons[G00-G09] Tuberculosis[A15-A19] Hazard Ratio per 1 per Ratio Hazard 1.0 Otherviralinfecons[B25-B34] Viralinfeconswithlesions[B00-B09]

0.8

0.8 1.0 1.2 1.5 1.8 2.2 2.7 Risk of mortality Risk of hospitalization Hazard Ratio per 1SD log(CRP)

Figure S4: Comparison between the predictive ability of GlycA and CRP in relation to infectious diseases in the FINRISK97 cohort. Hazard ratios for risk of mortality (red) or hospitalization (white) conferred per SD increment of GlycA (y-axis) and CRP (x-axis) for infection-related diagnoses in the FINRISK97 cohort. Both GlycA and CRP were log transformed. Hazard ratios that lie inside the grey triangles denote events for which CRP is more predictive than GlycA. 7,599 individuals were followed during 13.8-year follow-up, and Cox models were adjusted for age, sex, triglycerides and incidence of the same diagnosis in the 10-years prior to sample collection.

4 4.5

Fungalinfecons[B35-B49]

2.7 Inuenza[J10-J11]

Sepcaemia[A40-A41]

1.6 Localizedskininfecons[L00-L08] Sepcaemia [A40-A41] Lowerrespiratoryinfecons[J20-J22] Pneumonia[J12-J18] Intesnalinfecons[A00-A09] Boneandjointinfecons Pneumonia[J12-J18] [M00-M03,M86]

SD log(GlycA adjusted for CRP) for adjusted log(GlycA SD Centralnervoussystem Cardiacinfecons[I30,I32,I33,I40,I41] infecons[G00-G09] Acuterespiratoryinfecons[J00-J06] Urinarysysteminfecons[N10,N11,N30,N34,N39] Viralfevers[A90-A99] 1.0 Otherviralinfecons [B25-B34] Urinarysysteminfecons[N10,N11,N30,N34,N39] Viralinfeconswithlesions[B00-B09]

Tuberculosis[A15-A19]

Hazard Ratio per 1 per Ratio Hazard 0.6

0.6 1.0 1.6 2.7 4.5 Risk of mortality Risk of hospitalization Hazard Ratio per 1SD log(GlycA)

Figure S5: Comparison between the predictive ability of GlycA and GlycA adjusted for CRP in relation to infectious diseases in the FINRISK97 cohort. Hazard ratios for risk of mortality (red) or hospitalization (white) conferred per SD increment of GlycA (x-axis) and GlycA adjusted for CRP (y-axis), for infection-related diagnoses in the FINRISK97 cohort. Both GlycA and CRP were log transformed. Hazard ratios that lie close to the diagonal denote events for which GlycA is only weakly attenuated by adjustment for CRP. 7,599 individuals were followed during a median of 13.8-year follow-up, and Cox models were adjusted for sex, triglycerides and incidence of the same diagnosis in the 10-years prior to sample collection.

5 Pvalue Events 2.36 5 8 × 10 29 Nonlocalized infections [A00B99] 9 1.40 2 × 10 585 1.48 5 Intestinal infections [A00A09] 7 × 10 189 Bacterial infection [A04] 1.49 0.06 42 Viral infection [A08] 1.70 0.07 19 4 Diarrhoea and gastroenteritis [A09] 1.47 7 × 10 139 1.11 Tuberculosis [A15A19] 0.8 16 Respiratory tuberculosis [A15A16] 1.34 0.5 11 2.35 4 3 × 10 24 Other bacterial diseases [A30A49] 8 1.50 9 × 10 318

2.25 0.004 18 Septicaemia [A40A41] 1.66 3 × 104 87 5 Skin infection [A46] 1.51 2 × 10 195 5 Unspecified location [A49] 1.51 2 × 10 195 1.29 Viral fevers [A90A99] 0.4 21 Haemorrhagic fevers [A98] 1.29 0.4 21 0.93 Viral infections with lesions [B00B09] 0.8 29 Shingles [B02] 1.08 0.8 23 0.95 Other viral infections [B25B34] 0.9 15 2.42 Fungal infections [B35B49] 0.01 11 1.39 0.009 117 Respiratory infections [J00J22] 12 1.48 3 × 10 571 1.41 Acute respiratory infections [J00J06] 0.002 149 Acute sinusitis [J01] 1.41 0.06 56 Acute tonsillitis [J03] 2.12 0.03 13 Upper respiratory infections [J06] 1.42 0.02 79 2.24 Influenza [J10J11] 0.03 11 1.38 0.01 113 Pneumonia [J12J18] 8 1.45 3 × 10 408

1.27 0.4 27 Bacterial pneumonia [J15] 1.42 5 × 104 189 1.42 0.02 86 Pneumonia, organism unspecified [J18] 5 1.42 1 × 10 271 1.77 5 Lower respiratory infections [J20J22] 1 × 10 98 5 Acute bronchitis [J20] 1.80 2 × 10 85 Lower respiratory infections, unspecified [J22] 1.97 0.05 13 Other localized infections 1.14 Central nervous system infections [G00G09] 0.8 11 1.55 Cardiac infections [I30,I32,I33,I40,I41] 0.2 13 1.61 Localized skin infections [L00L08] 0.005 61 Abcesses and boils [L02] 1.54 0.1 26 Cellulitis [L03] 1.81 0.04 20 1.93 Bone and joint infections [M00M03,M86] 0.005 29 Septic arthritis [M00] 2.49 0.004 13 Other joint infections [M02M03] 1.41 0.4 10 1.11 0.8 12 Urinary system infections [N10,N11,N30,N34,N39] 1.24 0.005 324 4 Kidney infections [N10] 1.52 5 × 10 125 Bladder infections [N30] 1.06 0.7 74 Urinary tract infections [N39] 1.24 0.05 174

Risk of mortality 0.6 1 1.6 2.7 4.5 Risk of hospitalization Hazard Ratio (95% CI) per 1SD log(GlycA)

Figure S6: Detailed breakdown of GlycA-associated risk for infection-related diagnoses in the FINRISK97 cohort. Hazard ratios for risk of mortality (red) or hospitalization (white) conferred per SD increment of log transformed GlycA for infection-related diagnoses with more than 10 events in the FINRISK97 study. 7,599 apparently healthy individuals from the general population were prospectively observed over a 13.8-year follow-up period. Cox models were adjusted for age, sex, triglycerides and incidence of the same diagnosis in the 10-years prior to sample collection.

6 TABLES Table S1: Association of assayed acute-phase glycoproteins with GlycA levels in the DILGOM cohort.

Association magnitude 95% confidence interval P-value Alpha-1-acid glycoprotein 0.40 0.32– 0.47 2 x 10-23 Haptoglobin 0.26 0.18– 0.33 4 x 10-11 Transferrin 0.18 0.12– 0.25 5 x 10-8 Alpha-1 antitrypsin 0.13 0.069– 0.20 7 x 10-5

Multivariable linear regression of GlycA on the four assayed acute-phase glycoproteins, adjusted for age and sex (Methods). Association magnitudes denote the difference in SD-units of GlycA per SD increase of each . GlycA and the four assayed glycoproteins were log-transformed.

7 Table S2: Associations of cytokines and CRP with GlycA in the YFS2007 cohort.

Association 95% confidence Name / Symbol Cytokine P-value magnitude interval HGF Hepatocyte growth factor 0.34 0.30– 0.38 1 x 10-57 IL-18 Interleukin-18 0.23 0.19– 0.28 5 x 10-25 MIP-1β / CCL4 Macrophage inflammatory protein-1 beta 0.18 0.14– 0.23 8 x 10-17 CTACK / CCL27 Cutaneous T-cell attracting chemokine -0.17 -0.21– -0.13 9 x 10-15 IL-2Rα Interleukin-2 receptor alpha 0.17 0.12– 0.21 1 x 10-13 IL-8 Interleukin-8 0.16 0.12– 0.20 3 x 10-13 MIG / CXCL9 Monokine induced by interferon-gamma 0.15 0.11– 0.20 4 x 10-12 IL-9 Interleukin-9 0.14 0.099– 0.19 2 x 10-10 bFGF / FGF2 Basic fibroblast growth factor 0.13 0.087– 0.17 5 x 10-9 IP-10 / CXCL10 Interferon gamma-induced protein 10 0.13 0.086– 0.17 6 x 10-9 β-NGF / NGF Beta nerve growth factor 0.13 0.085– 0.17 1 x 10-8 MIF Macrophage migration inhibitory factor 0.13 0.082– 0.17 1 x 10-8 GROα / CXCL1 Growth regulated oncogene-alpha 0.12 0.081– 0.17 3 x 10-8 IL-5 Interleukin-5 0.12 0.079– 0.17 4 x 10-8 VEGF Vascular endothelial growth factor 0.12 0.078– 0.17 6 x 10-8 PDGF-BB Platelet derived growth factor BB 0.12 0.077– 0.16 6 x 10-8 IL-7 Interleukin-7 0.11 0.070– 0.16 3 x 10-7 IL1RA Interleukin-1 receptor antagonist 0.11 0.067– 0.15 7 x 10-7 MIP-1α / CCL3 Macrophage inflammatory protein-1 alpha 0.11 0.062– 0.15 2 x 10-6 IL-2 Interleukin-2 0.10 0.059– 0.15 4 x 10-6 IL-13 Interleukin-13 0.099 0.055– 0.14 1 x 10-7 IL-6 Interleukin-6 0.096 0.052– 0.14 2 x 10-5 SCF / KITLG Stem cell factor 0.095 0.051– 0.14 3 x 10-5 IL-4 Interleukin-4 0.092 0.049– 0.14 4 x 10-5 IL-1β Interleukin-1-beta 0.089 0.046– 0.13 6 x 10-5 IL-17 / IL-17A Interleukin-17 0.086 0.042– 0.13 1 x 10-4 IL-12 Interleukin-12 heterodimer 0.083 0.040– 0.13 2 x 10-4 SCGFβ / CLEC11A Stem cell growth factor beta 0.084 0.039– 0.13 2 x 10-4 GCSF / CSF3 Granulocyte colony-stimulating factor 0.081 0.037– 0.12 3 x 10-4 IL-10 Interleukin-10 0.079 0.035– 0.12 4 x 10-4 MCP-1 / CCL2 Monocyte chemotactic protein-1 0.072 0.028– 0.12 0.001 TRAIL / TNFSF10 TNF-related apoptosis inducing ligand 0.075 0.029– 0.12 0.001 IFNγ Interferon-gamma 0.071 0.028– 0.11 0.001 IL-16 Interleukin-16 0.073 0.028– 0.12 0.001 TNF-α / TNF Tumor necrosis factor-alpha 0.051 0.0076– 0.094 0.02 Eotaxin-1 / CCL11 Eotaxin-1 -0.021 -0.066– 0.023 0.3 CRP C-reactive protein 0.45 0.42– 0.49 6 x 10-105

Linear regression of GlycA on each cytokine or CRP for participants from the YFS2007 cohort reporting no febrile infection in the two weeks prior to blood sample collection. Models were adjusted for age and sex. GlycA and CRP were log transformed and standardised. A rank based inverse normal transform was applied to each cytokine. Association magnitudes denote the difference in SD-units of GlycA per SD increase of each cytokine or CRP. Symbols are provided where they differ from the official name for each cytokine.

8 Table S3: Longitudinal association between GlycA over 10 years across the three adulthood surveys of the YFS cohort.

Association magnitude 95% confidence interval P-value 2011 on 2007 0.47 0.42– 0.51 5 x 10-96 2007 on 2001 0.41 0.37– 0.45 2 x 10-72 2011 on 2001 0.41 0.37– 0.46 2 x 10-78

Linear regression between GlycA in the same individuals across health examination years of the YFS cohort (Methods). Models were adjusted for age and sex. GlycA was natural log transformed and standardised. Association magnitudes denote the difference in SD-units of GlycA per SD increase of GlycA at a previous survey.

9 Table S4: GlycA associations with the DILGOM cohort network modules

Module Number of probes Variance Association 95% confidence P-value explained magnitude interval 1 8,680 27% -0.048 -0.12– 0.025 0.2 2 5,403 23% -0.025 -0.095– 0.046 0.5 3 3,258 20% 0.069 -0.0039– 0.14 0.06 4 1,775 26% -0.0041 -0.075– 0.066 0.9 5 1,734 29% -0.085 -0.15– -0.017 0.02 6 1,019 33% 0.089 0.020– 0.16 0.01 7 604 35% 0.013 -0.059– 0.084 0.7 8 598 23% 0.036 -0.038– 0.11 0.3 9 545 34% -0.016 -0.086– 0.054 0.7 10 339 36% 0.099 0.028– 0.17 0.007 11 265 25% 0.010 -0.060– 0.080 0.8 12 255 34% -0.086 -0.16– -0.015 0.02 13 239 47% 0.080 0.0099– 0.15 0.03 14 225 42% 0.12 0.045– 0.19 0.001 15 208 29% -0.12 -0.19– -0.049 0.001 16 179 42% -0.033 -0.11– 0.040 0.4 17 138 40% 0.057 -0.013– 0.13 0.1 18 112 51% 0.062 -0.0076– 0.13 0.08 19 84 39% 0.0064 -0.066– 0.079 0.9 20 80 43% 0.017 -0.054– 0.087 0.6 21 77 40% 0.025 -0.046– 0.097 0.5 22 67 49% -0.085 -0.15– -0.016 0.02 23 64 41% -0.063 -0.13– 0.0064 0.08 24 63 29% 0.069 0.00052– 0.14 0.05 25 43 45% -0.029 -0.099– 0.041 0.4 26 40 56% 0.016 -0.054– 0.085 0.7 27 39 42% 0.071 0.000083– 0.14 0.05 28 34 46% 0.027 -0.043– 0.097 0.4 Module A (NDM) 31 58% 0.15 0.077– 0.22 4 x 10-5 30 31 47% -0.017 -0.086– 0.051 0.6 31 31 44% -0.080 -0.15– -0.011 0.02 32 30 42% -0.092 -0.16– -0.022 0.01 33 28 44% 0.043 -0.028– 0.11 0.2 34 25 54% -0.017 -0.089– 0.054 0.6 Module C 20 55% 0.12 0.048– 0.19 9 x 10-4 36 19 50% 0.086 0.017– 0.15 0.02 37 18 62% 0.040 -0.029– 0.11 0.3 38 15 54% -0.029 -0.11– 0.051 0.5 Module B 10 47% -0.14 -0.21– -0.071 1 x 10-4 40 10 60% 0.028 -0.042– 0.098 0.4

Coexpression modules discovered in the DILGOM cohort through WGCNA and their associations with GlycA from a linear regression on their summary expression profile, adjusting for age, sex, and triglyceride levels (Methods). Variance explained indicates how well the summary expression profile (Methods) characterises the module, denoting the proportion of the variance of the gene expression it explains. Association magnitudes denote difference in SD-units of log-transformed GlycA per SD increase in each module’s summary expression profile. Modules with P < 0.001 (Bonferroni adjusting for the number of modules) have been bolded and named.

10 Table S5: Replication of GlycA-associated modules in the YFS2011 cohort. a) Replication of network topology Module A (NDM) Module B Module C Density ≤ 1 x 10-6 6 x 10-4 ≤ 1 x 10-6 Proportion of variance explained ≤ 1 x 10-6 1 x 10-4 ≤ 1 x 10-6 Correlation of coexpression ≤ 1 x 10-6 3 x 10-4 ≤ 1 x 10-6 Correlation of intramodular connectivity ≤ 1 x 10-6 0.02 0.01 Correlation of module membership ≤ 1 x 10-6 0.006 ≤ 1 x 10-6 Mean sign-aware coexpression ≤ 1 x 10-6 4 x 10-6 ≤ 1 x 10-6 Mean sign-aware module membership ≤ 1 x 10-6 ≤ 1 x 10-6 ≤ 1 x 10-6 b) Number of Variance Association 95% confidence Replication of GlycA associations P-value probes explained magnitude interval Module A (NDM) 31 58% 0.15 0.077– 0.22 4 x 10-5 DILGOM Module B 10 47% -0.14 -0.21– -0.071 1 x 10-4 Module C 20 55% 0.12 0.048– 0.19 9 x 10-4 Module A (NDM) 31 56% 0.12 0.084– 0.16 2 x 10-10 YFS Module B 10 42% -0.072 -0.11– -0.036 1 x 10-4 Module C 20 60% 0.046 0.0096– 0.082 0.01 a) P-values for replication of module network topology (Methods) for the GlycA associated modules (Modules A, B, and C). NDM: The neutrophil degranulation module. b) Linear regressions of GlycA on module summary expression (Principal Component 1; Methods) in the DILGOM and YFS2011 cohorts, adjusting for age, sex, and triglycerides. Variance explained indicates how well the summary expression profile characterises the module, denoting the proportion of the variance of the corresponding gene expression subset it explains. Association magnitudes denote difference in SD-units of GlycA per SD increase in each module’s summary expression profile. Modules were considered to replicate if P < 0.05 for all module preservation statistics (a) and P < 0.001 for module–GlycA associations in both the DILGOM and YFS2011 cohorts (b) (Modules A and B were replicated).

11 Table S6: Significantly enriched (GO) Biological Process terms for Module B.

GO ID GO biological process #AG P-value GO:0045806 Negative regulation of endocytosis 9 PACSIN1, 7 x 10-5 SCAMP5 GO:0008355 Olfactory learning 1 DRD4 0.003 GO:0019511 Peptidyl-proline hydroxylation 1 LEPREL1 0.003 GO:0050848 Regulation of calcium-mediated signaling 4 DRD4 0.004 GO:0045956 Positive regulation of calcium ion-dependent exocytosis 4 SCAMP5 0.004 GO:0007195 Inhibition of adenylate cyclase activity by dopamine receptor signaling 4 DRD4 0.004 pathway GO:0048149 Behavioral response to ethanol 4 DRD4 0.004 GO:0032417 Positive regulation of sodium:hydrogen antiporter activity 2 DRD4 0.004 GO:0051586 Positive regulation of dopamine uptake 2 DRD4 0.004 GO:0042053 Regulation of dopamine metabolic process 7 DRD4 0.004 GO:0007212 Dopamine receptor signaling pathway 7 DRD4 0.004 GO:0048148 Behavioral response to cocaine 7 DRD4 0.004 GO:0042596 Fear response 5 DRD4 0.004 GO:0032963 Collagen metabolic process 5 LEPREL1 0.004 GO:0030818 Negative regulation of cAMP biosynthetic process 9 DRD4 0.004 GO:0050709 Negative regulation of protein secretion 9 DRD4 0.004 GO:0051927 Negative regulation of calcium ion transport via voltage-gated calcium channel 9 DRD4 0.004 activity GO:0060080 Regulation of inhibitory postsynaptic membrane potential 9 DRD4 0.004 GO:0033674 Positive regulation of kinase activity 6 DRD4 0.004 GO:0007614 Short-term memory 6 DRD4 0.004 GO:0042417 Dopamine metabolic process 10 DRD4 0.004 GO:0001963 Synaptic transmission, dopaminergic 10 DRD4 0.004 GO:0034776 Response to histamine 3 DRD4 0.005 GO:0034976 Response to endoplasmic reticulum stress 13 SCAMP5 0.005 GO:0050482 Arachidonic acid secretion 13 DRD4 0.005 GO:0046928 Regulation of neurotransmitter secretion 14 DRD4 0.005 GO:0001662 Behavioral fear response 15 DRD4 0.005 GO:0007194 Negative regulation of adenylate cyclase activity 18 DRD4 0.006 GO:0051402 Neuron apoptosis 19 TNFRSF21 0.006 GO:0050715 Positive regulation of cytokine secretion 21 SCAMP5 0.007 GO:0035176 Social behavior 26 DRD4 0.008 GO:0001975 Response to amphetamine 29 DRD4 0.009 GO:0008344 Adult locomotory behavior 32 DRD4 0.009 GO:0006887 Exocytosis 51 SCAMP5 0.01 GO:0006874 Cellular calcium ion homeostasis 63 DRD4 0.02 GO:0000187 Activation of MAPK activity 84 DRD4 0.02 GO:0007010 Cytoskeleton organization 103 PACSIN1 0.03 GO:0006897 Endocytosis 109 PACSIN1 0.03 GO:0044255 Cellular lipid metabolic process 128 TNFRSF21 0.03

GO Biological Process terms significantly over-represented in Module B (N = 9 annotated genes). P-values were obtained from a Hypergeometric test and were false discovery rate (FDR) corrected. GO ID: Unique identifier used

12 by the Gene Ontology consortium. #AG: the total number of genes in the reference set (N = 41,264 genes) annotated with the corresponding term.

13 Table S7: Significantly enriched Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways in Module A (the Neutrophil Degranulation Module; NDM).

KEGG ID KEGG pathway #AG Genes P-value Kegg:05146 Amoebiasis 102 SERPINB10, CTSG 0.01 Kegg:04145 Phagosome 135 MPO, OLR1 0.01 Kegg:05322 Systemic lupus erythematosus 87 CTSG, ELANE 0.02 Kegg:04614 Renin-angiotensin system 17 CTSG 0.03 Kegg:05310 Asthma 24 RNASE3 0.04 Kegg:02010 ABC transporters 42 ABCA13 0.05

KEGG pathways significantly over-represented in Module A; the Neutrophil Degranulation Module (NDM) (N = 26 genes with annotations). P-values were obtained from a Hypergeometric test and were false discovery rate (FDR) corrected. #AG: the total number of genes in the reference set (N = 41,264 genes) annotated with the corresponding pathway.

14

Table S8: Significantly enriched GO Biological Process terms in Module A (NDM).

GO ID: GO biological process: #AG Genes P-value GO:0042742 Defense response to bacterium 80 RNASE3, DEFA1B, LTF, BPI, 4 x 10-12 DEFA4, DEFA3, CAMP GO:0050832 Defense response to fungus 16 DEFA1B, CTSG, DEFA4, 1 x 10-11 MPO, DEFA3 GO:0031640 Killing of cells of other organism 11 DEFA1B, DEFA4, DEFA3 1 x 10-6 GO:0044130 Negative regulation of growth of symbiont in host 13 CTSG, MPO, CAMP 1 x 10-6 GO:0006508 Proteolysis 519 PRTN3, MMP8, OLR1, LTF, 3 x 10-4 CTSG GO:0050829 Defense response to Gram-negative bacterium 14 AZU1, CAMP 5 x 10-4 GO:0006955 Immune response 314 DEFA1B, CTSG, BPI, 5 x 10-4 CEACAM8 GO:0030574 Collagen catabolic process 20 PRTN3, MMP8 6 x 10-4 GO:0006401 RNA catabolic process 18 RNASE2, RNASE3 6 x 10-4 GO:0006935 Chemotaxis 125 RNASE2, DEFA1B, AZU1 7 x 10-4 GO:0044140 Negative regulation of growth of symbiont on or near 1 CAMP 0.004 host surface GO:0002149 Hypochlorous acid biosynthetic process 1 MPO 0.004 GO:0050754 Positive regulation of fractalkine biosynthetic process 1 AZU1 0.004 GO:0050725 Positive regulation of interleukin-1 beta biosynthetic 1 AZU1 0.004 process GO:0051873 Killing by host of symbiont cells 2 CAMP 0.005 GO:0015891 Siderophore transport 2 LCN2 0.005 GO:0045123 Cellular extravasation 2 AZU1 0.005 GO:0097029 Mature dendritic cell differentiation 2 PRTN3 0.005 GO:0002679 Respiratory burst involved in defense response 2 MPO 0.005 GO:0070946 Neutrophil mediated killing of gram-positive bacterium 2 CTSG 0.005 GO:0015755 Fructose transport 2 SLC2A5 0.005 GO:0043031 Negative regulation of macrophage activation 4 BPI 0.009 GO:0001878 Response to yeast 4 MPO 0.009 GO:0001774 Microglial cell activation 4 AZU1 0.009 GO:0050765 Negative regulation of phagocytosis 5 PRTN3 0.01 GO:0010952 Positive regulation of peptidase activity 5 PCOLCE2 0.01 GO:0032717 Negative regulation of interleukin-8 production 5 BPI 0.01 GO:0032496 Response to lipopolysaccharide 129 CTSG, MPO 0.01 GO:0009615 Response to virus 140 DEFA1B, DEFA3 0.01 GO:0008347 Glial cell migration 6 AZU1 0.01 GO:0042117 Monocyte activation 6 AZU1 0.01 GO:0043114 Regulation of vascular permeability 7 AZU1 0.01 GO:0050778 Positive regulation of immune response 8 CTSG 0.01 GO:0045348 Positive regulation of MHC class II biosynthetic process 8 AZU1 0.01 GO:0042535 Positive regulation of tumor necrosis factor biosynthetic 9 AZU1 0.01 process GO:0048246 Macrophage chemotaxis 10 AZU1 0.02 GO:0019430 Removal of superoxide radicals 10 MPO 0.02 GO:0006916 Anti-apoptosis 199 AZU1, MPO 0.02

15 GO:0032094 Response to food 11 MPO 0.02 GO:0034374 Low-density lipoprotein particle remodeling 11 MPO 0.02 GO:0031581 Hemidesmosome assembly 12 COL17A1 0.02 GO:0050930 Induction of positive chemotaxis 12 AZU1 0.02 GO:0032715 Negative regulation of interleukin-6 production 13 BPI 0.02 GO:0006826 Iron ion transport 13 LTF 0.02 GO:0050766 Positive regulation of phagocytosis 16 AZU1 0.02 GO:0045861 Negative regulation of proteolysis 17 SERPINB10 0.02 GO:0006954 Inflammatory response 246 OLR1, AZU1 0.02 GO:0055072 Iron ion homeostasis 18 LCN2 0.02 GO:0042744 Hydrogen peroxide catabolic process 19 MPO 0.02 GO:0032720 Negative regulation of tumor necrosis factor production 19 BPI 0.02 GO:0007159 Leukocyte cell-cell adhesion 23 OLR1 0.03 GO:0008152 Metabolic process 310 MMP8, LTF 0.03 GO:0045785 Positive regulation of cell adhesion 27 AZU1 0.03 GO:0050830 Defense response to Gram-positive bacterium 31 CAMP 0.03 GO:0008643 Carbohydrate transport 37 SLC2A5 0.03 GO:0007205 Activation of protein kinase C activity by G-protein 37 AZU1 0.03 coupled receptor protein signaling pathway GO:0030162 Regulation of proteolysis 37 SERPINB10 0.03 GO:0045444 Fat cell differentiation 38 RETN 0.04 GO:0006959 Humoral immune response 35 LTF 0.04 GO:0042157 Lipoprotein metabolic process 36 OLR1 0.04 GO:0008645 Hexose transport 40 SLC2A5 0.04 GO:0042542 Response to hydrogen peroxide 41 OLR1 0.04 GO:0008015 Blood circulation 44 OLR1 0.04 GO:0009612 Response to mechanical stimulus 45 MPO 0.04

GO Biological Process terms significantly over-represented in Module A (N = 26 genes with annotations). P-values were obtained from a Hypergeometric test and P-values were false discovery rate (FDR) corrected. GO ID: Unique identifier used by the Gene Ontology consortium. #AG: the total number of genes in the reference set (N = 41,264 genes) annotated with the corresponding pathway.

16 Table S9: Module A (NDM) gene content. Symbol Full name Conn. Probe ID RefSeq ID Chr Position Defensin, alpha 3, DEFA3 9.20 ILMN_2165289 NM_005217.2 8 6,873,391– 6,875,816 neutrophil-specific 9.20 ILMN_1725661 6,854,288– 6,856,724 DEFA1B Defensin, alpha 1B 9.10 ILMN_1679357 NM_001042500.1 8 6,835,171– 6,837,614 9.10 ILMN_2102721 6,873,391– 6,875,823 6,835,171– 6,837,614 DEFA1 Defensin, alpha 1 9.10 ILMN_2193213 NM_004084.2 8 6,854,288– 6,856,724 6,873,391– 6,875,823 Defensin, alpha 4, DEFA4 8.10 ILMN_1753347 NM_001925.1 8 6,793,342– 6,795,860 corticostatin Elastase, neutrophil ELANE 6.40 ILMN_1706635 NM_001972.2 19 852,291– 856,246 expressed Carcinoembryonic antigen-related cell CEACAM6 adhesion molecule 6 (non- 6.30 ILMN_1712522 NM_002483.3 19 42,259,428– 42,276,113 specific cross reacting antigen) Carcinoembryonic CEACAM8 antigen-related cell 6.10 ILMN_1806056 NM_001816.2 19 43,084,395– 43,099,082 adhesion molecule 8 LCN2 Lipocalin 2 5.70 ILMN_1692223 NM_005564.3 9 130,911,732– 130,915,734 AZU1 Azurocidin 1 4.80 ILMN_1730867 NM_001700.3 19 827,831– 832,017 Bactericidal/permeability- BPI 4.50 ILMN_1766736 NM_001725.1 20 36,932,552– 36,965,905 increasing protein CTSG Cathepsin G 4.10 ILMN_1680424 NM_001911.2 14 25,042,724– 25,045,466 Cathelicidin antimicrobial CAMP 3.90 ILMN_1688580 NM_004345.3 3 48,264,837– 48,266,981 peptide LTF Lactotransferrin 3.80 ILMN_1677920 NM_002343.2 3 46,477,496– 46,506,632 MPO Myeloperoxidase 2.50 ILMN_1705183 NM_000250.1 17 56,347,217– 56,358,296 Oxidized low density OLR1 lipoprotein (lectin-like) 2.20 ILMN_1723035 NM_002543.3 12 10,310,899– 10,324,790 receptor 1 1.70 ILMN_2116877 OLFM4 Olfactomedin 4 NM_006418.3 13 53,602,876– 53,626,196 0.033 ILMN_1753954 Collagen, type XVII, alpha 1.20 ILMN_1651282 COL17A1 NM_000494.3 10 105,791,046– 105,845,638 1 0.16 ILMN_1799105 Ribonuclease, RNase A RNASE3 0.95 ILMN_2113126 NM_002935.2 14 21,359,562– 21,360,507 family, 3 RETN Resistin 0.79 ILMN_1675190 NM_020415.2 19 7,733,972– 7,735,340 PRTN3 Proteinase 3 0.71 ILMN_1668460 NM_002777.3 19 840,985– 848,175 ATP-binding cassette, sub- ABCA13 family A (ABC1), member 0.53 ILMN_1704579 NM_152701.2 7 48,211,057– 48,687,091 13 MMP8 Matrix metallopeptidase 8 0.52 ILMN_1736026 NM_002424.1 11 102,582,526– 102,595,685 Solute carrier family 2 (facilitated SLC2A5 0.49 ILMN_1671337 NM_003039.1 1 9,097,005– 9,129,887 glucose/fructose transporter), member 5 Ribonuclease, RNase A RNASE2 family, 2 (liver, eosinophil- 0.24 ILMN_1730628 NM_002934.2 14 21,423,630– 21,424,594 derived neurotoxin) Procollagen C- PCOLCE2 0.21 ILMN_1746888 NM_013363.2 3 142,536,702– 142,608,045 endopeptidase enhancer 2 Serpin peptidase inhibitor, SERPINB10 clade B (ovalbumin), 0.20 ILMN_2147424 NM_005024.1 18 61,582,745– 61,602,476 member 10 Msh homeobox 2 MSX2P1 0.043 ILMN_2197647 NR_002307.1 17 56,234,320– 56,236,480 pseudogene 1

17 Genes comprising Module A, the Neutrophil Degranulation Module (NDM). Probe ID: Illumina HT12 microarray probe identifiers. RefSeq ID: National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) identifier. Chr: . Position: location of mRNA sequence, NCBI build 37. Conn.: mean intramodular probe connectivity (relative biological importance to the module1,2; Methods) across the DILGOM and YFS2011 cohorts.

18 Table S10: Module A (NDM) gene annotations. Symbol Annotation DEFA3 Also known as human neutrophil peptides, the products of these four alpha defensins are abundant in the DEFA1B primary/azurophil granules of and epithelia of mucosal surfaces. Their products exhibit DEFA1 antimicrobial activity against phagocytosed bacteria, fungi, enveloped viruses and protozoaE,3,4. DEFA4 Contained within the primary/azurophil granules of neutrophils, its product is an antimicrobial peptide that ELANE hydrolyzes . Its product is also secreted into the extracellular matrix, where it degrades extracellular matrix proteins. It has been shown to be regulated by alpha-1 antitrypsinE,3,4. CEACAM6 Expression of carcinoembryonic antigen-related cell adhesion molecules are used as markers of various cancers as their production is thought to stop before birthE,5. However, CEACAM-1, -3, -6 and -8 have been found to be CEACAM8 expressed on human neutrophils and their products facilitate neutrophil adhesion to endothelial cells, which line blood vessels6. CEACAM1 and CEACAM3 clustered into Module 6 (Table S4). Also known as Neutrophil Gelatinase-Associated Lipocalin (NGAL), it is abundant in the secondary/specific and LCN2 tertiary/gelatinase granules of neutrophils, as well as in epithelial cells3,4. It’s TLR-induced protein product prevents bacterial growth through sequestration of iron-scavenging bacterial siderophores4,7. Azurocidin is abundant in the primary/azurophil granules of neutrophils. It is homologous to the other AZU1 antimicrobial serine proteases (ELANE, PRTN3, and CTSG) but its product is proteolytically inactive. It increases vascular permeability during neutrophil extravasation, and is chemotactic to several immune cell typesE,3. Abundant in the primary/azurophil granules of neutrophils, its product has bactericidal activity of Gram- BPI negative bacteria3. It is also expressed in epithelial cells4. Abundant in the primary/azurophil granules of neutrophils, its product is an antimicrobial serine protease. Its CTSG product has antimicrobial properties, proteolytic activity against extracellular matrix proteins, induces the activation of endothelial and epithelial cells, and is chemotactic to several immune cell typesE,3. Constitutively expressed in subsets of lymphocytes, monocytes, and in epithelial cells, CAMP is also expressed in the secondary/specific granules of neutrophils. Its product is set free by the product of PRTN3 after exocytosis. CAMP Its product has antimicrobial activity against both Gram-positive and Gram-negative bacteria. It is also secreted from keratinocytes3. Abundant in the secondary/specific granules of neutrophils, its product has antimicrobial activity against both LTF Gram-positive and Gram-negative bacteria. Similarly to LCN2, its product also impairs bacterial growth by sequestration of iron3. A heme protein synthesized during myeloid differentiation, its product forms a major component of the MPO primary/azurophil granules of neutrophils and produces hypohalous acids, which are central to the antimicrobial activity of neutrophilsE,4. Its product is a cell surface receptor expressed in vascular endothelial cells, and has been associated with OLR1 atherosclerosis and myocardial infarction riskE,8. Has been shown to co-localize with LCN2 in the secondary/specific granules of neutrophils. It is also expressed OLFM4 in inflamed epithelial cellsE,9. COL17A1 Its product is a transmembrane protein found in keratinocytesE. Its product is abundant in eosinophil granules and in mature neutrophils, and is toxic certain parasites, bacteria, RNASE3 and viruses10,11. Expressed in a range of immune cells, its product has a positive effect on endothelial cell activation, and an RETN inhibitory effect on neutrophils12. Abundant in the primary/azurophil granules of neutrophils, its product is an antimicrobial serine protease (see PRTN3 CTSG)E,3. Its product is a transmembrane transporter protein. Ubiquitous expression has been observed in blood derived ABCA13 cell lines13. Also known as neutrophil collagenase, its product is abundant in the secondary/specific granules of neutrophils. Together with MMP9 (clustered into Module 6; Table S4) and MMP25 (two probes clustering into Modules 6 MMP8 and 10; Table S4), its product degrades major structural components of the extracellular matrix, and are central to neutrophil extravasation and migration3. Also known as GLUT5, its product is a fructose transporter, which is expressed in enterocytes of the small SLC2A5 intestine, as well as monocytes, macrophages and foam cells14. A paralog of RNASE3, its product is abundant in eosinophil granules and in mature neutrophils, and is toxic RNASE2 certain parasites, bacteria, and viruses10,11. PCOLCE2 Insufficient information to annotate its function. It is expressed in a wide range of tissuesE,15. SERPINB10 SERPINB10 is a serine protease inhibitor expressed in bone marrow cells16. MSX2P1 MSX2P1 is a pseudogene, therefore non-functional.

19 Curated annotations for genes comprising Module A (NDM). E: information sourced from the summary provided by RefSeq in the NCBI database. Superscript numbers correspond to key review papers for each gene.

20 Table S11: Module quantitative trait loci for the NDM. a) Minor 95% confidence rsID Chr SNP position Cohort Effect size P-value allele interval DILGOM 0.0091 0.0037– 0.015 0.001 rs2485364 6 159,512,260 C YFS2011 0.0043 0.0025– 0.0061 3 x 10-6 Meta-analysis 0.0048 - 4 x 10-8 DILGOM 0.0093 0.0037– 0.018 0.04 rs13297295 9 131,659,724 C YFS2011 0.011 0.0083– 0.015 4 x 10-12 Meta-analysis 0.011 - 4 x 10-13 DILGOM 0.022 0.0063– 0.038 0.006 rs140929198 20 38,555,870 A YFS2011 0.015 0.010– 0.020 6 x 10-9 Meta-analysis 0.016 - 1 x 10-10 b) 95% confidence Chr LCN2 Position rs13297295 position Effect size P-value interval rs13297295 vs. LCN2 9 130,911,732– 130,915,734 131,659,724 0.40 0.27– 0.52 4 x 10-10 a) Associations between the top SNPs for the three module quantitative trait loci (mQTLs) and the NDM summary expression profile in the DILGOM cohort, the YFS2011 cohort, and in a meta-analysis of the two cohorts. Effect sizes denote the increase in normalized probe expression (Methods) per minor allele as estimated by a linear regression. Chr: chromosome. b) Association between rs13297295 and the NDM gene LCN2, which is located 750kB upstream, in the YFS2011 cohort. Effect size denotes the increase in normalized probe expression (Methods) per minor allele dosage as estimated by a linear regression. SNP positions indicate the location on the given chromosome from the Genome Reference Consortium human assembly build 37, patch 13 (GRCh37.p13), which is equivalent to the human assembly NCBI build 37 used to report gene locations.

21 Table S12: Associations between Module A (NDM) and leukocyte count in the YFS2011 cohort.

Association magnitude 95% confidence interval P-value 0.24 0.19– 0.28 2 x 10-24

Linear regression of NDM summary expression on leukocyte abundance, correcting for age, and sex (Methods). Leukocyte abundance was log transformed and standardised. NDM summary expression was standardised. Association magnitudes denote difference in SD-units of NDM summary expression per SD increase of leukocyte abundance.

22 Table S13: Independent associations of leukocyte count and NDM expression with GlycA

Association magnitude 95% confidence interval P-value NDM summary expression 0.092 0.054– 0.13 2 x 10-60 Leukocyte abundance 0.15 0.11– 0.19 2 x 10-15

Linear regression of GlycA on NDM summary expression and leukocyte count, adjusting for age, sex, and triglyceride levels (Methods). Leukocyte count, GlycA and triglycerides were log transformed and all continuous measurements were standardised. Association magnitudes denote difference in SD-units of GlycA per SD increase of NDM summary expression or leukocyte abundance.

23 REFERENCES

1. Langfelder, P., Luo, R., Oldham, M. C. & Horvath, S. Is my network module preserved and reproducible? PLoS Comput. Biol. 7, e1001057 (2011).

2. Langfelder, P., Mischel, P. S. & Horvath, S. When is hub gene selection better than standard meta-analysis? PLoS One 8, e61505 (2013).

3. Faurschou, M. & Borregaard, N. Neutrophil granules and secretory vesicles in . Microbes Infect. 5, 1317–1327 (2003).

4. Borregaard, N., Sørensen, O. E. & Theilgaard-Mönch, K. Neutrophil granules: a library of innate immunity proteins. Trends Immunol. 28, 340–345 (2007).

5. Kuespert, K., Pils, S. & Hauck, C. R. CEACAMs: their role in physiology and pathophysiology. Curr. Opin. Cell Biol. 18, 565–571 (2006).

6. Skubitz, K. M. & Skubitz, A. P. N. Interdependency of CEACAM-1, -3, -6, and -8 induced human neutrophil adhesion to endothelial cells. J. Transl. Med. 6, 78 (2008).

7. Flo, T. H. et al. Lipocalin 2 mediates an innate immune response to bacterial infection by sequestrating iron. Nature 432, 917–921 (2004).

8. Kume, N. et al. Inducible expression of lectin-like oxidized LDL receptor-1 in vascular endothelial cells. Circ. Res. 83, 322–327 (1998).

9. Clemmensen, S. N. et al. Olfactomedin 4 defines a subset of human neutrophils. J. Leukoc. Biol. 91, 495–500 (2012).

10. Boix, E. et al. Crystal structure of eosinophil cationic protein at 2.4 A resolution. Biochemistry 38, 16794– 16801 (1999).

11. Sur, S. et al. Localization of eosinophil-derived neurotoxin and eosinophil cationic protein in neutrophilic leukocytes. Int. Arch. Allergy Immunol. 118, 255–258 (1999).

12. Cohen, G., Ilic, D., Raupachova, J. & Hörl, W. H. Resistin inhibits essential functions of polymorphonuclear leukocytes. J. Immunol. 181, 3761–3768 (2008).

13. Albrecht, C. & Viturro, E. The ABCA subfamily-gene and protein structures, functions and associated hereditary diseases. Pflügers Arch. - Eur. J. Physiol. 453, 581–589 (2007).

14. Fu, Y., Maianu, L., Melbert, B. R. & Garvey, W. T. Facilitative glucose transporter gene expression in human lymphocytes, monocytes, and macrophages: a role for GLUT isoforms 1, 3, and 5 in the immune response and foam cell formation. Blood Cells. Mol. Dis. 32, 182–190 (2004).

15. Xu, H., Acott, T. S. & Wirtz, M. K. Identification and expression of a novel type I procollagen C-proteinase enhancer protein gene from the glaucoma candidate region on 3q21-q24. Genomics 66, 264–273 (2000).

16. Riewald, M. & Schleef, R. R. Molecular cloning of bomapin (protease inhibitor 10), a novel human serpin that is expressed specifically in the bone marrow. J. Biol. Chem. 270, 26754–26757 (1995).

24