bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
1
1 The interplay between host genetics and the gut microbiome reveals common
2 and distinct microbiome features for human complex diseases
3 Fengzhe Xu1#, Yuanqing Fu1#, Ting-yu Sun2, Zengliang Jiang1,3, Zelei Miao1, Menglei
4 Shuai1, Wanglong Gou1, Chu-wen Ling2, Jian Yang4,5, Jun Wang6*, Yu-ming Chen2*,
5 Ju-Sheng Zheng1,3,7*
6 #These authors contributed equally to the work
7 1 School of Life Sciences, Westlake University, Hangzhou, China.
8 2 Guangdong Provincial Key Laboratory of Food, Nutrition and Health; Department
9 of Epidemiology, School of Public Health, Sun Yat-sen University, Guangzhou,
10 China.
11 3 Institute of Basic Medical Sciences, Westlake Institute for Advanced Study,
12 Hangzhou, China.
13 4 Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD,
14 Australia.
15 5 Institute for Advanced Research, Wenzhou Medical University, Wenzhou, Zhejiang
16 325027, China
17 6 CAS Key Laboratory for Pathogenic Microbiology and Immunology, Institute of
18 Microbiology, Chinese Academy of Sciences, Beijing, China.
19 7 MRC Epidemiology Unit, University of Cambridge, Cambridge, UK.
20
21 Short title: Interplay between and host genetics and gut microbiome
22
1 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
2
23 *Correspondence to
24 Prof Ju-Sheng Zheng
25 School of Life Sciences, Westlake University,18 Shilongshan Rd, Cloud Town,
26 Hangzhou, China. Tel: +86 (0)57186915303. Email: [email protected]
27 And
28 Prof Yu-Ming Chen
29 Guangdong Provincial Key Laboratory of Food, Nutrition and Health; Department of
30 Epidemiology, School of Public Health, Sun Yat-sen University, Guangzhou, China.
31 Email: [email protected]
32 And
33 Prof Jun Wang
34 CAS Key Laboratory for Pathogenic Microbiology and Immunology, Institute of
35 Microbiology, Chinese Academy of Sciences, Beijing, China.
36 Email: [email protected]
37
2 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
3
38 Abstract
39 There is increasing interest about the interplay between host genetics and gut
40 microbiome on human complex diseases, with prior evidence mainly derived from
41 animal models. In addition, the shared and distinct microbiome features among
42 human complex diseases remain largely unclear. We performed a microbiome
43 genome-wide association study to identify host genetic variants associated with gut
44 microbiome in a Chinese population with 1475 participants. We then conducted
45 bi-directional Mendelian randomization analyses to examine the potential causal
46 associations between gut microbiome and human complex diseases. We found that
47 Saccharibacteria (also known as TM7 phylum) could potentially improve renal
48 function by affecting renal function biomarkers (i.e., creatinine and estimated
49 glomerular filtration rate). In contrast, atrial fibrillation, chronic kidney disease and
50 prostate cancer, as predicted by the host genetics, had potential causal effect on gut
51 microbiome. Further disease-microbiome feature analysis suggested that gut
52 microbiome features revealed novel relationship among human complex diseases.
53 These results suggest that different human complex diseases share common and
54 distinct gut microbiome features, which may help re-shape our understanding about
55 the disease etiology in humans.
56
3 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
4
57 Introduction
58 Ever-increasing evidence has suggested that gut microbiome is involved in many
59 physiological processes, such as energy harvest, immune response, and neurological
60 function1-3. With successes of investigation into the clinical application of fecal
61 transplants, modulation of gut microbiome has emerged as a potential treatment
62 option for some complex diseases, including inflammatory bowel disease and
63 colorectal cancer 4,5. However, it is still unclear whether the gut microbiome has the
64 potential to be clinically applied for the prevention or treatment of many other
65 complex diseases. Therefore, it is important to clarify the bi-directional causal
66 association between gut microbiome and human complex diseases or traits.
67
68 Mendelian randomization (MR) is a method that uses genetic variants as instrumental
69 variables to investigate the causality between an exposure and outcome in
70 observational studies6. Prior literature provides evidence that the composition or
71 structure of the gut microbiome can be influenced by the host genetics7-10. On the
72 other hand, host genetic variants associated with gut microbiome were rarely explored
73 in Asian populations, thus we are still lacking instrument variables to perform MR for
74 gut microbiome in Asians. This calls for novel microbiome genome-wide association
75 study (GWAS) in Asian populations.
76
77 Along with the causality issue between the gut microbiome and human complex
78 diseases, it is so far unclear whether human complex diseases had similar or unique
4 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
5
79 gut microbiome features. Identifying common and distinct gut microbiome features
80 across different diseases might shed light on novel relationships among the complex
81 diseases and update our understanding about the disease etiology in humans. However,
82 the composition and structure of gut microbiome are influenced by a variety of factors
83 including environment, diet and regional variation 11-13, which posed a key challenge
84 for the description of representative microbiome features for a specific disease.
85 Although there were several studies comparing disease-related gut microbiome 14-16,
86 few of them has examined and compared the microbiome features across different
87 human complex diseases.
88
89 In the present study, we performed a microbiome GWAS in a Chinese cohort study:
90 the Guangzhou Nutrition and Health Study (GNHS)17, including 1475 participants.
91 Subsequently, we applied a bi-directional MR method to explore the genetically
92 predicted relationship between gut microbiome and human complex diseases. To
93 explore novel relationships among human complex diseases based on gut microbiome,
94 we investigated the shared and distinct gut microbiome features across diverse human
95 complex diseases 18.
96
97 Result
98 Overview of the study
99 Our study was based on the GNHS, with 4048 participants (40-75 years old) living in
100 urban Guangzhou city recruited during 2008 and 2013 17. In the GNHS, stool samples
5 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
6
101 were collected among 1937 participants during follow-up visits, among which 1475
102 unrelated participants without taking anti-biotics were included in our discovery
103 microbiome GWAS. We then included additional 199 participants with both genetic
104 and gut microbiome data as a replication cohort, which belonged to the control arm of
105 a case-control study of hip fracture in Guangdong Province, China 19.
106
107 For both discovery and replication cohorts, genotyping was carried out with Illumina
108 ASA-750K arrays. Quality control and relatedness filters were performed by
109 PLINK1.9 20. We conducted the genome-wide genotype imputation with 1000
110 Genomes Phase3 v5 reference panel by Minimac321-23. HLA region was imputed with
111 Pan-Asian reference panel and SNP2HLA v1.0.3 24-26.
112
113 Association of host genetics with gut microbiome features
114 We performed a series of microbiome GWAS with PLINK 1.9 based on logistic
115 models for binary variables 20. For continuous variables, we used GCTA with mixed
116 linear model-based association (MLMA) method 27,28. We also analyzed categorical
117 variable enterotypes of the participants based on genus-level relative abundance of gut
118 microbiome, using the Jensen-Shannon Distance (JSD) and the Partitioning Around
119 Medoids (PAM) clustering algorithm 29. The participants were subsequently clustered
120 into two groups according to the enterotypes (Prevotella vs Bacteroides). Thereafter,
121 we performed GWAS for enterotypes using logistic regression model to explore
122 potential associations between host genetics and enterotypes. However, we did not
6 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
7
123 find any genome-wide significant locus with p<5×10-8. Furthermore, we used a
124 restricted maximum likelihood analysis (REML) with GCTA to estimate the
125 SNP-based heritability, and the estimate heritability of the enterotype was 0.055
126 (SE=0.19, Supplementary Table S2) 30.
127
128 To examine the association of host genetic variants with alpha diversity, we performed
129 GWAS for three indices (Shannon diversity index, Chao1 diversity indices and
130 observed OTUs index), but again no genome-wide significant signal (p<5×10-8) was
131 found. In the discovery cohort, the heritability of alpha diversity ranged from 0.054 to
132 0.14 (SE=0.20 for all indices, Supplementary Table S2). To further investigate if there
133 is host genetic basis underlying alpha diversity, we constructed a polygenic score for
134 each alpha diversity indicator in the replication cohort, using the genetic variants
135 which showed suggestive significance (p<5×10-5) in the discovery GWAS. The
136 polygenic score was not significantly associated with its corresponding alpha diversity
137 index in our replication cohort. Meanwhile, none of the associations with alpha
138 diversity indices reported in the literature could be replicated (Supplementary Table
139 S7) 7.
140
141 We performed a beta diversity GWAS using a tool called MicrobiomeGWAS 31, and
142 found that one locus at SMARCA2 gene (rs6475456) was associated with
143 beta-diversity at a genome-wide significance level (p=3.96×10-9). However, we could
144 not replicate the results in the replication cohort, which may be due to the limited
7 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
8
145 sample size of the replication cohort. In addition, prior literature had reported 73
146 genetic variants that were associated with beta diversity 8,13,32,33, among which we
147 found that 3 single nucleotide polymorphisms (SNP, UHRF2 gene-rs563779, LHFPL3
148 gene-rs12705241, CTD-2135J3.4-rs11986935) had nominal significant (p<0.05)
149 association with beta-diversity in our cohort (Supplementary Table S6), although none
150 of the association survived Bonferroni correction. These studies used various methods
151 for the sequencing and calculation of beta diversity, which raised challenges to verify
152 and extrapolate results across populations.
153
154 We then took the genetic loci reported to be associated with individual taxa in prior
155 studies 7,8,13,33 for replication in our GNHS dataset. Although there are still some
156 signals with nominal significance (p<0.05) in our study (e.g., 7 loci associated with
157 Lachnospiraceae, Coprococcus or Bacteroides with p<0.05; Supplementary Table S5),
158 none of the associations of these genetic variants with taxa survived the Bonferroni
159 correction (p<1×10-4). The null results may be because of various clustering
160 similarities, classifiers or reference databases to annotate taxa and different
161 sequencing methods used in these studies.
162
163 We subsequently performed GWAS discovery for individual gut microbes in our own
164 GNHS discovery dataset. For the taxa present at more than ninety percent of
165 participants and alpha diversity, we used Z-score normalization to transform the
166 distribution and carried out analysis based on a log-normal model. A MLMA test in
8 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
9
167 the GCTA was used to assess the association, with the first five principal components,
168 age, sex and sequencing batch fitted as fixed effects and the effects of all the SNPs
169 fitted as random effects 27,28,34. For other taxa present at fewer than ninety percent, we
170 transformed the absence/presence of the taxon into binary variables and used
171 PLINK1.9 to perform a logistic model, adjusted for the first five principal components,
172 age, sex and sequencing batch.
173
174 For all the gut microbiome taxa, the significant threshold was defined as 5×10-8 in the
175 discovery stage. As some taxa were correlated with each other, we also used an
176 eigendecomposition analysis to calculate the effective number of independent taxa on
177 each taxonomy level (phylum level: 2.3, class level: 2.9, order level: 2.9, family level:
178 5.5, genus level: 5.6, species level: 3.2) 35,36. We found that 6 taxa were associated
179 with host genetic variants in the discovery cohort (p<5×10-8/n, n is the effective
180 number of independent taxa on each taxonomy level, Supplementary Table S4);
181 however, these associations were not significant (p>0.05) in the replication cohort. We
182 then used a threshold of p<5×10-5 at the GWAS discovery stage to incorporate
183 additional genetic variants which may explain a larger proportion of heritability for
184 taxa, based on which we constructed a polygenic score for each taxon in the
185 replication. We found that the polygenic scores were significantly associated with 5
186 taxa including Saccharibacteria (also known as TM7 phylum), Clostridiaceae,
187 Comamonadaceae, Klebsiella and D168 species (belong to Desulfovibrio genus) in
188 the replication set (p<0.05, Methods, see also Figure 1A, 1B, 1C, 1D, 1E).
9 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
10
189
190 Genetic correlation of gut microbiome and traits
191 We used GCTA to perform a bivariate GREML (genomic-relatedness-based restricted
192 maximum-likelihood) analysis to estimate the genetic correlation between gut
193 microbiome and traits in the GNHS 27,37. The traits included BMI, fasting blood sugar
194 (FBS), glycosylated hemoglobin (HbA1c), systolic blood pressure (SBP), diastolic
195 blood pressure (DBP), high density lipoprotein cholesterol (HDL-C), low density
196 lipoprotein cholesterol (LDL-C), total cholesterol (TC) and triglyceride (TG), none of
197 which could pass Bonferroni correction. Additionally, HDL-C was the only trait that
198 had nominal genetic correlation (p<0.05) with gut microbes (specifically,
199 Desulfovibrionaceae and [Prevotella], Supplementary Table S3).
200
201 Bi-directional assessment of the genetically predicted association between gut
202 microbiome and complex diseases/traits
203 Using genetic variants-composed polygenic scores as genetic instruments, we
204 performed MR analysis to assess the putative causal effect of microbiome
205 (Saccharibacteria, Clostridiaceae, Comamonadaceae, Klebsiella and D168 species)
206 on human complex diseases or traits. Inverse variance weighted (IVW) method was
207 used for the MR analysis, while other three methods (Weighted median, MR-Egger
208 and MR-PRESSO) 38,39 were applied to confirm the robustness of results. The
209 horizontal pleiotropy was assessed using MR-PRESSO Global test and MR-Egger
210 Regression. For the analysis of gut microbiome on complex traits, we downloaded
10 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
11
211 public available GWAS summary statistics of complex traits (n=58) and diseases
212 (type 2 diabetes mellitus (T2DM), atrial fibrillation (AF), colorectal cancer (CRC)
213 and prostatic cancer (PCa)) reported by BioBank Japan 40-44. The result suggested that
214 Saccharibacteria could potentially decrease the concentration of serum creatinine
215 (p=0.008, Supplementary Table S9) and increase estimated glomerular filtration
216 rate(eGFR)( p=0.003, Supplementary Table S9), which helps improve renal function.
217 We did not find evidence of pleiotropic effect: genetic variants associated with
218 Saccharibacteria were not associated with any of the above traits (58 complex traits
219 and 4 disease outcomes, p<0.05/62). These taxa were not causally associated with
220 other human complex diseases or traits in our MR analyses, which may be due to the
221 limited genetic instruments discovered in our present study.
222
223 We subsequently performed a reserve MR analysis to assess the potential causal effect
224 of human complex diseases on gut microbiome features. For the reserve MR analyses,
225 the diseases of interests included T2DM, AF, coronary artery disease (CAD), chronic
226 kidney disease (CKD), Alzheimer's disease (AD), CRC and PCa, and their
227 instrumental variables for the MR analysis were based on the previous large-scale
228 GWAS in East Asians40,45-50. The results suggested that AF and CKD were causally
229 associated with gut microbiome (See also Figure 2A, 2B, Supplementary Table S10).
230 Genetically predicted higher risk of AF was associated with lower abundance of
231 Coprophilus, Lachnobacterium, Barnesiellaceae, Veillonellaceae and Mitsuokella,
232 and higher abundance of Alcaligenaceae. Additionally, genetically predicted higher
11 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
12
233 risk of CKD could increase Anaerostipes abundance and higher risk of PCa could
234 decrease [Prevotella].
235
236 To further investigate the potential complex diseases that may be correlated with the
237 taxa affected by AF, we applied Phylogenetic Investigation of Communities by
238 Reconstruction of Unobserved States (PICRUSt) to predict the disease pathway
239 abundance 51. We used Spearman's rank-order correlation to test whether 22 human
240 complex diseases were associated with the aforementioned AF-associated taxa (See
241 also Figure 2C). The heatmap indicated that cancers and neurodegenerative diseases
242 including Parkinson’s disease (PD), AD, amyotrophic lateral sclerosis (ALS) as well
243 as AF were correlated with similar gut microbiome. Although the association among
244 these diseases are highly supported by previous studies 52-54, no study has compared
245 common gut microbiome features across these different diseases.
246
247 Microbiome features of human complex diseases
248 To compare gut microbiome features across human diseases, we chose 22 human
249 complex diseases from predicted abundance and performed k-medoids clustering 18.
250 We used an matrix to perform the cluster analysis, where m is the number of
251 participantsm and × n n is the number of selected diseases. According to optimum average
252 silhouette width 55, we chose optimal number of clusters for further analysis (See also
253 Figure 3A). The plot showed that neurological diseases including ALS and AD
254 belonged to the same cluster, while PD and CRC had much similarity in gut
12 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
13
255 microbiome. The results also suggested that systemic lupus erythematosus (SLE) and
256 chronic myeloid leukemia (CML) shared similar gut microbiome features. Moreover,
257 we could replicate these clusters in our replication cohort, which suggested that the
258 clustering results were robust (See also Figure 3B).
259
260 We further asked whether gut microbiome contributed to the novel clustering. To this
261 end, we repeated the analysis among participants who took antibiotic less than two
262 weeks before stool sample collection, considering that antibiotic treatments were
263 believed to cause microbiome imbalance, and the clusters were totally different in this
264 group (See also Figure 3C). The results indicated a totally different clustering, which
265 suggested that gut microbiome indeed contributed to the correlations among diseases.
266 To further demonstrate common microbiome features across different diseases, we
267 examined the correlation of the predicted diseases with genus-level taxa. The results
268 showed that human complex diseases had shared similar gut microbiome features, as
269 well as distinct features on their own (See also Figure 4).
270
271 To validate the accuracy of the association between the predicted disease-related gut
272 microbiome features and the corresponding disease, we used T2DM as an example,
273 examining the association of predicted T2DM-related microbiome features with
274 T2DM risk in our GNHS samples. We constructed a microbiome risk score (MRS)
275 based on 16 selected taxa with predicted correlation coefficient with T2DM greater
276 than 0.2. A logistic regression was used to examine the above MRS with T2DM risk
13 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
14
277 in the GNHS (n=1886, with 217 T2DM cases). The results showed that higher MRS
278 was associated with lower risk of T2DM (odds ratio: 0.850, 95% confidence interval:
279 0.804 to 0.898, p=8.75×10-9).
280
281 Based on the above results, we proposed a hypothesis that related diseases might
282 share similar gut microbiome features. To test for this hypothesis, we performed
283 validation analysis by including GNHS participants who had one of the following
284 self-reported diseases: stroke (n=8), chronic hepatitis (n=19), coronary heart diseases
285 (CHD) (n=40), cataract (n=124) and insomnia (n=68). The results of k-medoids
286 clustering suggested that CHD, cataract and insomnia shared common gut
287 microbiome features, which was supported by the prior research reporting that both
288 patients suffering insomnia and women receiving cataract extraction had increased
289 risks of CHD 56-58.
290
291 Discussion
292 Our study is among the first to investigate the host genetics-gut microbiome
293 association in East Asian populations and reveals that several microbiome species
294 (e.g., Saccharibacteria and Klebsiella) are influenced by host genetics. We then show
295 that Saccharibacteria might causally improve renal function by affecting renal
296 function biomarkers (i.e., creatinine and eGFR). On the other hand, complex diseases
297 such as atrial fibrillation, chronic kidney disease and prostate cancer, have potential
298 causal effect on gut microbiome. More interestingly, our results indicate that different
14 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
15
299 human complex diseases may be mechanically correlated by sharing common gut
300 microbiome features, but also maintaining their own distinct microbiome features.
301
302 Previous studies and our study showed that gut microbiome had an inclination to be
303 influenced by host genetics 8,10,33,59, although the successful replication tends to be
304 rare. We could not validate any of the reported genetic variants that were significantly
305 associated with gut microbiome features in prior reports, which may reflect the
306 difference in population and heterogeneity between study but also raise concerns
307 about the reproducibility. Many factors including ethnic differences,
308 gene-environment interaction and dissimilarity in sequencing methods may make it
309 hard to extrapolate results from microbiome GWAS across populations in the
310 microbiome field. Nevertheless, we successfully replicate several polygenic scores of
311 gut microbiome, and the current study represent the largest dataset in Asian
312 populations and would be a unique resource to be used in large-scale trans-ethnic
313 meta-analysis of microbiome GWAS in future.
314
315 The MR analysis of gut microbiome on diseases or traits found that Saccharibacteria
316 might decrease the concentration of serum creatinine and increase eGFR. Little is
317 known about the Saccharibacteria as one of the uncultivated phyla, and it might be
318 essential for the immune response, oral inflammation and inflammatory bowel disease
319 as shown previously60-62. Our results also provide genetic instrument of
320 Saccharibacteria for further causal analysis with other complex diseases. The reverse
15 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
16
321 MR analysis provided evidence that AF, CKD and PCa could causally influence gut
322 microbiome. As our study is among the first to investigate gene-microbiome
323 association in East Asians, we need further study in this region to confirm our results.
324 Additionally, rare and low-frequency variants may have an important impact on
325 common diseases63, thus it will be of interest to clarify the effects of low-frequency
326 variants on gut microbiome in cohorts with large sample sizes in future.
327
328 Our results indicate that gut microbiome helps reveal novel and interesting
329 relationships among human complex diseases, suggesting that different diseases may
330 have common and distinct gut microbiome features. A prior study including
331 participants from different countries identified three microbiome clusters29. Notably,
332 this study focused on classifying the individuals into distinct enterotypes regardless of
333 the individuals’ health status, while in the present study we described representative
334 microbiome features for diseases of interest. We provide an approach to interpret the
335 data from mechanistic studies based on microbiome. The microbiome features
336 revealed a close association of AF with neurodegenerative diseases as well as cancers,
337 which was supported by prior studies showing that AF had correlation with AD and
338 PD52,53, and AF patients had relatively higher risks of several cancers including lung
339 cancer and CRC54,64. We also observed that microbiome features of SLE and CML
340 were highly similar. Interestingly, a tyrosine kinase inhibitor of platelet-derived
341 growth factor receptor, imatinib, was widely used to treat CML and significantly
342 ameliorated survival in murine lupus autoimmune disease65. In addition, association
16 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
17
343 between CRC and PD has been reported in several observational cohorts66,67.
344 Collectively, these findings strongly support our hypothesis that human complex
345 diseases sharing similar microbiome features might be mechanically correlated.
346 Furthermore, from the perspectives of risk genes of AF and neurodegenerative
347 diseases, previous GWAS for AF have identified two loci at PITX2 gene-rs6843082
348 and C9orf3 gene-rs7026071, which were also associated with the risk of ALS
349 (p=0.0138 and p=0.049, respectively) 68-70.
350
351 In summary, we perform bi-directional MR analyses and reveal some causal
352 relationships between abundance of gut microbiome and human complex diseases or
353 traits. The disease and gut microbiome feature analysis from population studies
354 reveals novel relationships among human complex diseases, which may help re-shape
355 our understanding about the disease etiology, as well as extending clinical indications
356 of existing drugs for different diseases.
357
358 Method
359 Study participants and sample collection
360 Our study was based on the Guangzhou Nutrition and Health Study (GNHS), with
361 4048 participants (40-75 years old) living in urban Guangzhou city recruited during
362 2008 and 2013 17. We followed up participants every three years. In the GNHS, stool
363 samples were collected among 1937 participants during follow-up visits. Among
364 those with stool samples, 1717 participants had genetic data and 1475 participants
17 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
18
365 with identical by decent (IBD) less than 0.185.
366
367 We included 199 participants with both genetic and gut microbiome data as a
368 replication cohort, which belonged to the control arm of a case-control study of hip
369 fracture with the participants (52-83 years old) recruited between June 2009 and
370 August 2015 in Guangdong Province, China 19.
371
372 Blood samples of all participants were collected after an overnight fasting and buffy
373 coat was separated from whole blood and stored at -80℃. Stool samples were
374 collected during the on-site visit of the participants at Sun Yat-sen University. All
375 samples were manually stirred, separated into tubes and stored at -80℃ within four
376 hours.
377
378 Genotyping data
379 For both discovery and replicattion cohorts, DNA was extracted from leukocyte using
380 the TIANamp® Blood DNA Kit as per the manufacturer’s instruction. DNA
381 concentrations were determined using the Qubit quantification system (Thermo
382 Scientific, Wilmington, DE, US). Extracted DNA was stored at -80°C. Genotyping
383 was carried out with Illumina ASA-750K arrays. Quality control and relatedness
384 filters were performed by PLINK1.9 20. Individuals with high or low proportion of
385 heterozygous genotypes (outliers defined as 3 standard deviation) were excluded71.
386 Individuals who had different ancestries (the first two principal components ±5
18 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
19
387 standard deviation from the mean) or related individuals (IBD>0.185) were
388 excluded71. Variants were mapping to the 1000 Genomes Phase3 v5 by SHAPEIT 23,72
389 and then we conducted the genome-wide genotype imputation with 1000 Genomes
390 Phase3 v5 reference panel by Minimac321,22. Genetic variants with imputation
391 accuracy RSQR > 0.3 and MAF > 0.05 were included in our analysis. We used
392 Pan-Asian reference panel consist of 502 participants and SNP2HLA v1.0.3 to impute
393 HLA region 24-26.
394
395 Sequencing and processing of 16S rRNA data
396 Microbial DNA was extracted from fecal samples using the QIAamp® DNA Stool
397 Mini Kit per the manufacturer’s instruction. DNA concentrations were determined
398 using the Qubit quantification system. The V3-V4 region of the 16S rRNA gene was
399 amplified from genomic DNA using primers 341F and 805R. Sequencing was
400 performed using MiSeq Reagent Kits v2 on the Illuimina MiSeq System.
401
402 Fastq-files were demultiplexed by the MiSeq Controller Software. Ultra-fast sequence
403 analysis (USEARCH) was performed to trim the sequence for amplification primers,
404 diversity spacers, sequencing adapters, merge-paired and quality filter73. Operational
405 taxonomic units (OTUs) were clustered based on 97% similarity using UPARSE74.
406 OTUs were annotated with Greengenes 13_8
407 (https://greengenes.secondgenome.com/). After randomly selecting 10000 reads for
408 each sample, Quantitative Insights into Microbial Ecology (QIIME) software version
19 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
20
409 1.9.0 was used to calculate alpha diversity (Shannon diversity index, Chao1 diversity
410 indices and observed OTUs index) based on the rarefied OTU counts75.
411
412 Statistical analysis
413 Genome-wide association analysis of gut microbiome features
414 For each of the GNHS participants and the replication cohort, we clustered
415 participants based on genus-level relative abundance, estimating the JSD distance and
416 PAM clustering algorithm, and then defined two enterotypes according to
417 Calinski-Harabasz Index29,76. PLINK 1.9 was used to perform a logistic regression
418 model for enterotypes and taxa present at fewer than ninety percent, adjusted for age,
419 sex and the first five principal components.
420
421 For beta diversity, the analyses for the genome-wide host genetic variants with beta
422 diversity was performed using MicrobiomeGWAS31, adjusted for covariates including
423 the first five principal components, age and sex. We filtered OTUs present at fewer
424 than ten percent of participants to calculate Bray–Curtis dissimilarity.
425
426 Alpha diversity was calculated after randomly sampling 10000 reads per sample. For
427 the taxa present at more than ninety percent of participants and alpha diversity, we
428 used Z-score normalization to transform the distribution and carried out analysis
429 based on a log-normal model. A mixed linear model based association (MLMA) test
430 in GCTA was used to assess the association, fitting the first five principal components,
20 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
21
431 age, sex and sequencing batch as fixed effects and the effects of all the SNPs as
432 random effects 27,28,34. For other taxa present at fewer than ninety percent, we
433 transformed the absence/presence of the taxon into binary variables and used
434 PLINK1.9 to perform a logistic model, adjusted for the first five principal components,
435 age, sex and sequencing batch. For all the gut microbiome features, the significant
436 threshold was defined as 5×10-8 /n (n is the effective number of independent taxa on
437 each taxonomy level) in the discovery stage. We estimated genomic inflation factors
438 with LDSC v1.0.1 at local server 77.
439
440 Proportion of variance explained by all SNPs
441 We used the GREML method in GCTA to estimate the proportion of variance
442 explained by all SNPs 30. The taxa were divided into two groups based on whether the
443 taxa were present in the ninety percent of participants or not. For alpha diversity and
444 taxa, our model was adjusted for constrain covariates including sex and sequencing
445 batch, as well as quantitative covariates including the first five principal components
446 and age. The model was adjusted for the same covariates except for sequencing batch
447 for analysis of enterotype.
448
449 Genetic correlation of gut microbiome and traits
450 We used GCTA to perform a bivariate GREML analysis to estimate the genetic
451 correlation between gut microbiome and traits in the GNHS27,37. The gut microbiome
452 was divided into two groups according to the previous description. We used
21 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
22
453 continuous variables to taxa present at more than ninety percent of participants and
454 traits. For taxa present at fewer than ninety percent of participants, we used binary
455 variables according to the absence/presence of taxa. This analysis included traits such
456 as BMI, FBS, HbA1c, SBP, DBP, HDL-C, LDL-C, TC and TG.
457
458 Constructing polygenic scores for taxa and alpha diversity
459 We selected lead SNPs using PLINK v1.9 with the ‘—clump’ command to clump
460 SNPs that p value < 5×10-5 and r2<0.1 within 0.1 cM. We used beta coefficients as
461 weight to construct polygenic scores for taxa and alpha diversity. For alpha diversity
462 and taxa present at more than ninety percent participants, we constructed weighted
463 polygenic scores and performed the analysis on a general linear model with a negative
464 binomial distribution to test for association between the polygenic scores and taxa,
465 adjusted for the first five principal components, age, sex and sequencing batch. We
466 used weighted polygenic scores and logistic regression to the absence/presence taxa,
467 adjusted for the same covariates as in the above analysis. Taxa with significance
468 (p<0.05) in the replication cohort were included for the further analysis.
469
470 The effective number of independent taxa
471 As some taxa were correlated with each other, we used an eigendecomposition
472 analysis to calculate the effective number of independent taxa on each taxonomy level
473 35,36. Matrix M is an m×n matrix, where m is the number of participants and n is the
474 number of taxa on the corresponding taxa level. Matrix A is the variance-covariance
22 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
23
475 matrix of matrix M. P is the matrix of eigenvectors. is the diagonal