bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1

1 The interplay between host genetics and the gut microbiome reveals common

2 and distinct microbiome features for human complex diseases

3 Fengzhe Xu1#, Yuanqing Fu1#, Ting-yu Sun2, Zengliang Jiang1,3, Zelei Miao1, Menglei

4 Shuai1, Wanglong Gou1, Chu-wen Ling2, Jian Yang4,5, Jun Wang6*, Yu-ming Chen2*,

5 Ju-Sheng Zheng1,3,7*

6 #These authors contributed equally to the work

7 1 School of Life Sciences, Westlake University, Hangzhou, China.

8 2 Guangdong Provincial Key Laboratory of Food, Nutrition and Health; Department

9 of Epidemiology, School of Public Health, Sun Yat-sen University, Guangzhou,

10 China.

11 3 Institute of Basic Medical Sciences, Westlake Institute for Advanced Study,

12 Hangzhou, China.

13 4 Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD,

14 Australia.

15 5 Institute for Advanced Research, Wenzhou Medical University, Wenzhou, Zhejiang

16 325027, China

17 6 CAS Key Laboratory for Pathogenic Microbiology and Immunology, Institute of

18 Microbiology, Chinese Academy of Sciences, Beijing, China.

19 7 MRC Epidemiology Unit, University of Cambridge, Cambridge, UK.

20

21 Short title: Interplay between and host genetics and gut microbiome

22

1 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

2

23 *Correspondence to

24 Prof Ju-Sheng Zheng

25 School of Life Sciences, Westlake University,18 Shilongshan Rd, Cloud Town,

26 Hangzhou, China. Tel: +86 (0)57186915303. Email: [email protected]

27 And

28 Prof Yu-Ming Chen

29 Guangdong Provincial Key Laboratory of Food, Nutrition and Health; Department of

30 Epidemiology, School of Public Health, Sun Yat-sen University, Guangzhou, China.

31 Email: [email protected]

32 And

33 Prof Jun Wang

34 CAS Key Laboratory for Pathogenic Microbiology and Immunology, Institute of

35 Microbiology, Chinese Academy of Sciences, Beijing, China.

36 Email: [email protected]

37

2 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

3

38 Abstract

39 There is increasing interest about the interplay between host genetics and gut

40 microbiome on human complex diseases, with prior evidence mainly derived from

41 animal models. In addition, the shared and distinct microbiome features among

42 human complex diseases remain largely unclear. We performed a microbiome

43 genome-wide association study to identify host genetic variants associated with gut

44 microbiome in a Chinese population with 1475 participants. We then conducted

45 bi-directional Mendelian randomization analyses to examine the potential causal

46 associations between gut microbiome and human complex diseases. We found that

47 Saccharibacteria (also known as TM7 phylum) could potentially improve renal

48 function by affecting renal function biomarkers (i.e., creatinine and estimated

49 glomerular filtration rate). In contrast, atrial fibrillation, chronic kidney disease and

50 prostate cancer, as predicted by the host genetics, had potential causal effect on gut

51 microbiome. Further disease-microbiome feature analysis suggested that gut

52 microbiome features revealed novel relationship among human complex diseases.

53 These results suggest that different human complex diseases share common and

54 distinct gut microbiome features, which may help re-shape our understanding about

55 the disease etiology in humans.

56

3 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

4

57 Introduction

58 Ever-increasing evidence has suggested that gut microbiome is involved in many

59 physiological processes, such as energy harvest, immune response, and neurological

60 function1-3. With successes of investigation into the clinical application of fecal

61 transplants, modulation of gut microbiome has emerged as a potential treatment

62 option for some complex diseases, including inflammatory bowel disease and

63 colorectal cancer 4,5. However, it is still unclear whether the gut microbiome has the

64 potential to be clinically applied for the prevention or treatment of many other

65 complex diseases. Therefore, it is important to clarify the bi-directional causal

66 association between gut microbiome and human complex diseases or traits.

67

68 Mendelian randomization (MR) is a method that uses genetic variants as instrumental

69 variables to investigate the causality between an exposure and outcome in

70 observational studies6. Prior literature provides evidence that the composition or

71 structure of the gut microbiome can be influenced by the host genetics7-10. On the

72 other hand, host genetic variants associated with gut microbiome were rarely explored

73 in Asian populations, thus we are still lacking instrument variables to perform MR for

74 gut microbiome in Asians. This calls for novel microbiome genome-wide association

75 study (GWAS) in Asian populations.

76

77 Along with the causality issue between the gut microbiome and human complex

78 diseases, it is so far unclear whether human complex diseases had similar or unique

4 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

5

79 gut microbiome features. Identifying common and distinct gut microbiome features

80 across different diseases might shed light on novel relationships among the complex

81 diseases and update our understanding about the disease etiology in humans. However,

82 the composition and structure of gut microbiome are influenced by a variety of factors

83 including environment, diet and regional variation 11-13, which posed a key challenge

84 for the description of representative microbiome features for a specific disease.

85 Although there were several studies comparing disease-related gut microbiome 14-16,

86 few of them has examined and compared the microbiome features across different

87 human complex diseases.

88

89 In the present study, we performed a microbiome GWAS in a Chinese cohort study:

90 the Guangzhou Nutrition and Health Study (GNHS)17, including 1475 participants.

91 Subsequently, we applied a bi-directional MR method to explore the genetically

92 predicted relationship between gut microbiome and human complex diseases. To

93 explore novel relationships among human complex diseases based on gut microbiome,

94 we investigated the shared and distinct gut microbiome features across diverse human

95 complex diseases 18.

96

97 Result

98 Overview of the study

99 Our study was based on the GNHS, with 4048 participants (40-75 years old) living in

100 urban Guangzhou city recruited during 2008 and 2013 17. In the GNHS, stool samples

5 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

6

101 were collected among 1937 participants during follow-up visits, among which 1475

102 unrelated participants without taking anti-biotics were included in our discovery

103 microbiome GWAS. We then included additional 199 participants with both genetic

104 and gut microbiome data as a replication cohort, which belonged to the control arm of

105 a case-control study of hip fracture in Guangdong Province, China 19.

106

107 For both discovery and replication cohorts, genotyping was carried out with Illumina

108 ASA-750K arrays. Quality control and relatedness filters were performed by

109 PLINK1.9 20. We conducted the genome-wide genotype imputation with 1000

110 Genomes Phase3 v5 reference panel by Minimac321-23. HLA region was imputed with

111 Pan-Asian reference panel and SNP2HLA v1.0.3 24-26.

112

113 Association of host genetics with gut microbiome features

114 We performed a series of microbiome GWAS with PLINK 1.9 based on logistic

115 models for binary variables 20. For continuous variables, we used GCTA with mixed

116 linear model-based association (MLMA) method 27,28. We also analyzed categorical

117 variable enterotypes of the participants based on genus-level relative abundance of gut

118 microbiome, using the Jensen-Shannon Distance (JSD) and the Partitioning Around

119 Medoids (PAM) clustering algorithm 29. The participants were subsequently clustered

120 into two groups according to the enterotypes (Prevotella vs Bacteroides). Thereafter,

121 we performed GWAS for enterotypes using logistic regression model to explore

122 potential associations between host genetics and enterotypes. However, we did not

6 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

7

123 find any genome-wide significant locus with p<5×10-8. Furthermore, we used a

124 restricted maximum likelihood analysis (REML) with GCTA to estimate the

125 SNP-based heritability, and the estimate heritability of the enterotype was 0.055

126 (SE=0.19, Supplementary Table S2) 30.

127

128 To examine the association of host genetic variants with alpha diversity, we performed

129 GWAS for three indices (Shannon diversity index, Chao1 diversity indices and

130 observed OTUs index), but again no genome-wide significant signal (p<5×10-8) was

131 found. In the discovery cohort, the heritability of alpha diversity ranged from 0.054 to

132 0.14 (SE=0.20 for all indices, Supplementary Table S2). To further investigate if there

133 is host genetic basis underlying alpha diversity, we constructed a polygenic score for

134 each alpha diversity indicator in the replication cohort, using the genetic variants

135 which showed suggestive significance (p<5×10-5) in the discovery GWAS. The

136 polygenic score was not significantly associated with its corresponding alpha diversity

137 index in our replication cohort. Meanwhile, none of the associations with alpha

138 diversity indices reported in the literature could be replicated (Supplementary Table

139 S7) 7.

140

141 We performed a beta diversity GWAS using a tool called MicrobiomeGWAS 31, and

142 found that one locus at SMARCA2 gene (rs6475456) was associated with

143 beta-diversity at a genome-wide significance level (p=3.96×10-9). However, we could

144 not replicate the results in the replication cohort, which may be due to the limited

7 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

8

145 sample size of the replication cohort. In addition, prior literature had reported 73

146 genetic variants that were associated with beta diversity 8,13,32,33, among which we

147 found that 3 single nucleotide polymorphisms (SNP, UHRF2 gene-rs563779, LHFPL3

148 gene-rs12705241, CTD-2135J3.4-rs11986935) had nominal significant (p<0.05)

149 association with beta-diversity in our cohort (Supplementary Table S6), although none

150 of the association survived Bonferroni correction. These studies used various methods

151 for the sequencing and calculation of beta diversity, which raised challenges to verify

152 and extrapolate results across populations.

153

154 We then took the genetic loci reported to be associated with individual taxa in prior

155 studies 7,8,13,33 for replication in our GNHS dataset. Although there are still some

156 signals with nominal significance (p<0.05) in our study (e.g., 7 loci associated with

157 , Coprococcus or Bacteroides with p<0.05; Supplementary Table S5),

158 none of the associations of these genetic variants with taxa survived the Bonferroni

159 correction (p<1×10-4). The null results may be because of various clustering

160 similarities, classifiers or reference databases to annotate taxa and different

161 sequencing methods used in these studies.

162

163 We subsequently performed GWAS discovery for individual gut microbes in our own

164 GNHS discovery dataset. For the taxa present at more than ninety percent of

165 participants and alpha diversity, we used Z-score normalization to transform the

166 distribution and carried out analysis based on a log-normal model. A MLMA test in

8 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

9

167 the GCTA was used to assess the association, with the first five principal components,

168 age, sex and sequencing batch fitted as fixed effects and the effects of all the SNPs

169 fitted as random effects 27,28,34. For other taxa present at fewer than ninety percent, we

170 transformed the absence/presence of the taxon into binary variables and used

171 PLINK1.9 to perform a logistic model, adjusted for the first five principal components,

172 age, sex and sequencing batch.

173

174 For all the gut microbiome taxa, the significant threshold was defined as 5×10-8 in the

175 discovery stage. As some taxa were correlated with each other, we also used an

176 eigendecomposition analysis to calculate the effective number of independent taxa on

177 each level (phylum level: 2.3, class level: 2.9, order level: 2.9, family level:

178 5.5, genus level: 5.6, level: 3.2) 35,36. We found that 6 taxa were associated

179 with host genetic variants in the discovery cohort (p<5×10-8/n, n is the effective

180 number of independent taxa on each taxonomy level, Supplementary Table S4);

181 however, these associations were not significant (p>0.05) in the replication cohort. We

182 then used a threshold of p<5×10-5 at the GWAS discovery stage to incorporate

183 additional genetic variants which may explain a larger proportion of heritability for

184 taxa, based on which we constructed a polygenic score for each taxon in the

185 replication. We found that the polygenic scores were significantly associated with 5

186 taxa including Saccharibacteria (also known as TM7 phylum), Clostridiaceae,

187 Comamonadaceae, Klebsiella and D168 species (belong to Desulfovibrio genus) in

188 the replication set (p<0.05, Methods, see also Figure 1A, 1B, 1C, 1D, 1E).

9 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

10

189

190 Genetic correlation of gut microbiome and traits

191 We used GCTA to perform a bivariate GREML (genomic-relatedness-based restricted

192 maximum-likelihood) analysis to estimate the genetic correlation between gut

193 microbiome and traits in the GNHS 27,37. The traits included BMI, fasting blood sugar

194 (FBS), glycosylated hemoglobin (HbA1c), systolic blood pressure (SBP), diastolic

195 blood pressure (DBP), high density lipoprotein cholesterol (HDL-C), low density

196 lipoprotein cholesterol (LDL-C), total cholesterol (TC) and triglyceride (TG), none of

197 which could pass Bonferroni correction. Additionally, HDL-C was the only trait that

198 had nominal genetic correlation (p<0.05) with gut microbes (specifically,

199 Desulfovibrionaceae and [Prevotella], Supplementary Table S3).

200

201 Bi-directional assessment of the genetically predicted association between gut

202 microbiome and complex diseases/traits

203 Using genetic variants-composed polygenic scores as genetic instruments, we

204 performed MR analysis to assess the putative causal effect of microbiome

205 (Saccharibacteria, Clostridiaceae, Comamonadaceae, Klebsiella and D168 species)

206 on human complex diseases or traits. Inverse variance weighted (IVW) method was

207 used for the MR analysis, while other three methods (Weighted median, MR-Egger

208 and MR-PRESSO) 38,39 were applied to confirm the robustness of results. The

209 horizontal pleiotropy was assessed using MR-PRESSO Global test and MR-Egger

210 Regression. For the analysis of gut microbiome on complex traits, we downloaded

10 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

11

211 public available GWAS summary statistics of complex traits (n=58) and diseases

212 (type 2 diabetes mellitus (T2DM), atrial fibrillation (AF), colorectal cancer (CRC)

213 and prostatic cancer (PCa)) reported by BioBank Japan 40-44. The result suggested that

214 Saccharibacteria could potentially decrease the concentration of serum creatinine

215 (p=0.008, Supplementary Table S9) and increase estimated glomerular filtration

216 rate(eGFR)( p=0.003, Supplementary Table S9), which helps improve renal function.

217 We did not find evidence of pleiotropic effect: genetic variants associated with

218 Saccharibacteria were not associated with any of the above traits (58 complex traits

219 and 4 disease outcomes, p<0.05/62). These taxa were not causally associated with

220 other human complex diseases or traits in our MR analyses, which may be due to the

221 limited genetic instruments discovered in our present study.

222

223 We subsequently performed a reserve MR analysis to assess the potential causal effect

224 of human complex diseases on gut microbiome features. For the reserve MR analyses,

225 the diseases of interests included T2DM, AF, coronary artery disease (CAD), chronic

226 kidney disease (CKD), Alzheimer's disease (AD), CRC and PCa, and their

227 instrumental variables for the MR analysis were based on the previous large-scale

228 GWAS in East Asians40,45-50. The results suggested that AF and CKD were causally

229 associated with gut microbiome (See also Figure 2A, 2B, Supplementary Table S10).

230 Genetically predicted higher risk of AF was associated with lower abundance of

231 Coprophilus, Lachnobacterium, Barnesiellaceae, Veillonellaceae and Mitsuokella,

232 and higher abundance of Alcaligenaceae. Additionally, genetically predicted higher

11 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

12

233 risk of CKD could increase Anaerostipes abundance and higher risk of PCa could

234 decrease [Prevotella].

235

236 To further investigate the potential complex diseases that may be correlated with the

237 taxa affected by AF, we applied Phylogenetic Investigation of Communities by

238 Reconstruction of Unobserved States (PICRUSt) to predict the disease pathway

239 abundance 51. We used Spearman's rank-order correlation to test whether 22 human

240 complex diseases were associated with the aforementioned AF-associated taxa (See

241 also Figure 2C). The heatmap indicated that cancers and neurodegenerative diseases

242 including Parkinson’s disease (PD), AD, amyotrophic lateral sclerosis (ALS) as well

243 as AF were correlated with similar gut microbiome. Although the association among

244 these diseases are highly supported by previous studies 52-54, no study has compared

245 common gut microbiome features across these different diseases.

246

247 Microbiome features of human complex diseases

248 To compare gut microbiome features across human diseases, we chose 22 human

249 complex diseases from predicted abundance and performed k-medoids clustering 18.

250 We used an matrix to perform the cluster analysis, where m is the number of

251 participantsm and × n n is the number of selected diseases. According to optimum average

252 silhouette width 55, we chose optimal number of clusters for further analysis (See also

253 Figure 3A). The plot showed that neurological diseases including ALS and AD

254 belonged to the same cluster, while PD and CRC had much similarity in gut

12 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

13

255 microbiome. The results also suggested that systemic lupus erythematosus (SLE) and

256 chronic myeloid leukemia (CML) shared similar gut microbiome features. Moreover,

257 we could replicate these clusters in our replication cohort, which suggested that the

258 clustering results were robust (See also Figure 3B).

259

260 We further asked whether gut microbiome contributed to the novel clustering. To this

261 end, we repeated the analysis among participants who took antibiotic less than two

262 weeks before stool sample collection, considering that antibiotic treatments were

263 believed to cause microbiome imbalance, and the clusters were totally different in this

264 group (See also Figure 3C). The results indicated a totally different clustering, which

265 suggested that gut microbiome indeed contributed to the correlations among diseases.

266 To further demonstrate common microbiome features across different diseases, we

267 examined the correlation of the predicted diseases with genus-level taxa. The results

268 showed that human complex diseases had shared similar gut microbiome features, as

269 well as distinct features on their own (See also Figure 4).

270

271 To validate the accuracy of the association between the predicted disease-related gut

272 microbiome features and the corresponding disease, we used T2DM as an example,

273 examining the association of predicted T2DM-related microbiome features with

274 T2DM risk in our GNHS samples. We constructed a microbiome risk score (MRS)

275 based on 16 selected taxa with predicted correlation coefficient with T2DM greater

276 than 0.2. A logistic regression was used to examine the above MRS with T2DM risk

13 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

14

277 in the GNHS (n=1886, with 217 T2DM cases). The results showed that higher MRS

278 was associated with lower risk of T2DM (odds ratio: 0.850, 95% confidence interval:

279 0.804 to 0.898, p=8.75×10-9).

280

281 Based on the above results, we proposed a hypothesis that related diseases might

282 share similar gut microbiome features. To test for this hypothesis, we performed

283 validation analysis by including GNHS participants who had one of the following

284 self-reported diseases: stroke (n=8), chronic hepatitis (n=19), coronary heart diseases

285 (CHD) (n=40), cataract (n=124) and insomnia (n=68). The results of k-medoids

286 clustering suggested that CHD, cataract and insomnia shared common gut

287 microbiome features, which was supported by the prior research reporting that both

288 patients suffering insomnia and women receiving cataract extraction had increased

289 risks of CHD 56-58.

290

291 Discussion

292 Our study is among the first to investigate the host genetics-gut microbiome

293 association in East Asian populations and reveals that several microbiome species

294 (e.g., Saccharibacteria and Klebsiella) are influenced by host genetics. We then show

295 that Saccharibacteria might causally improve renal function by affecting renal

296 function biomarkers (i.e., creatinine and eGFR). On the other hand, complex diseases

297 such as atrial fibrillation, chronic kidney disease and prostate cancer, have potential

298 causal effect on gut microbiome. More interestingly, our results indicate that different

14 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

15

299 human complex diseases may be mechanically correlated by sharing common gut

300 microbiome features, but also maintaining their own distinct microbiome features.

301

302 Previous studies and our study showed that gut microbiome had an inclination to be

303 influenced by host genetics 8,10,33,59, although the successful replication tends to be

304 rare. We could not validate any of the reported genetic variants that were significantly

305 associated with gut microbiome features in prior reports, which may reflect the

306 difference in population and heterogeneity between study but also raise concerns

307 about the reproducibility. Many factors including ethnic differences,

308 gene-environment interaction and dissimilarity in sequencing methods may make it

309 hard to extrapolate results from microbiome GWAS across populations in the

310 microbiome field. Nevertheless, we successfully replicate several polygenic scores of

311 gut microbiome, and the current study represent the largest dataset in Asian

312 populations and would be a unique resource to be used in large-scale trans-ethnic

313 meta-analysis of microbiome GWAS in future.

314

315 The MR analysis of gut microbiome on diseases or traits found that Saccharibacteria

316 might decrease the concentration of serum creatinine and increase eGFR. Little is

317 known about the Saccharibacteria as one of the uncultivated phyla, and it might be

318 essential for the immune response, oral inflammation and inflammatory bowel disease

319 as shown previously60-62. Our results also provide genetic instrument of

320 Saccharibacteria for further causal analysis with other complex diseases. The reverse

15 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

16

321 MR analysis provided evidence that AF, CKD and PCa could causally influence gut

322 microbiome. As our study is among the first to investigate gene-microbiome

323 association in East Asians, we need further study in this region to confirm our results.

324 Additionally, rare and low-frequency variants may have an important impact on

325 common diseases63, thus it will be of interest to clarify the effects of low-frequency

326 variants on gut microbiome in cohorts with large sample sizes in future.

327

328 Our results indicate that gut microbiome helps reveal novel and interesting

329 relationships among human complex diseases, suggesting that different diseases may

330 have common and distinct gut microbiome features. A prior study including

331 participants from different countries identified three microbiome clusters29. Notably,

332 this study focused on classifying the individuals into distinct enterotypes regardless of

333 the individuals’ health status, while in the present study we described representative

334 microbiome features for diseases of interest. We provide an approach to interpret the

335 data from mechanistic studies based on microbiome. The microbiome features

336 revealed a close association of AF with neurodegenerative diseases as well as cancers,

337 which was supported by prior studies showing that AF had correlation with AD and

338 PD52,53, and AF patients had relatively higher risks of several cancers including lung

339 cancer and CRC54,64. We also observed that microbiome features of SLE and CML

340 were highly similar. Interestingly, a tyrosine kinase inhibitor of platelet-derived

341 growth factor receptor, imatinib, was widely used to treat CML and significantly

342 ameliorated survival in murine lupus autoimmune disease65. In addition, association

16 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

17

343 between CRC and PD has been reported in several observational cohorts66,67.

344 Collectively, these findings strongly support our hypothesis that human complex

345 diseases sharing similar microbiome features might be mechanically correlated.

346 Furthermore, from the perspectives of risk genes of AF and neurodegenerative

347 diseases, previous GWAS for AF have identified two loci at PITX2 gene-rs6843082

348 and C9orf3 gene-rs7026071, which were also associated with the risk of ALS

349 (p=0.0138 and p=0.049, respectively) 68-70.

350

351 In summary, we perform bi-directional MR analyses and reveal some causal

352 relationships between abundance of gut microbiome and human complex diseases or

353 traits. The disease and gut microbiome feature analysis from population studies

354 reveals novel relationships among human complex diseases, which may help re-shape

355 our understanding about the disease etiology, as well as extending clinical indications

356 of existing drugs for different diseases.

357

358 Method

359 Study participants and sample collection

360 Our study was based on the Guangzhou Nutrition and Health Study (GNHS), with

361 4048 participants (40-75 years old) living in urban Guangzhou city recruited during

362 2008 and 2013 17. We followed up participants every three years. In the GNHS, stool

363 samples were collected among 1937 participants during follow-up visits. Among

364 those with stool samples, 1717 participants had genetic data and 1475 participants

17 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

18

365 with identical by decent (IBD) less than 0.185.

366

367 We included 199 participants with both genetic and gut microbiome data as a

368 replication cohort, which belonged to the control arm of a case-control study of hip

369 fracture with the participants (52-83 years old) recruited between June 2009 and

370 August 2015 in Guangdong Province, China 19.

371

372 Blood samples of all participants were collected after an overnight fasting and buffy

373 coat was separated from whole blood and stored at -80℃. Stool samples were

374 collected during the on-site visit of the participants at Sun Yat-sen University. All

375 samples were manually stirred, separated into tubes and stored at -80℃ within four

376 hours.

377

378 Genotyping data

379 For both discovery and replicattion cohorts, DNA was extracted from leukocyte using

380 the TIANamp® Blood DNA Kit as per the manufacturer’s instruction. DNA

381 concentrations were determined using the Qubit quantification system (Thermo

382 Scientific, Wilmington, DE, US). Extracted DNA was stored at -80°C. Genotyping

383 was carried out with Illumina ASA-750K arrays. Quality control and relatedness

384 filters were performed by PLINK1.9 20. Individuals with high or low proportion of

385 heterozygous genotypes (outliers defined as 3 standard deviation) were excluded71.

386 Individuals who had different ancestries (the first two principal components ±5

18 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

19

387 standard deviation from the mean) or related individuals (IBD>0.185) were

388 excluded71. Variants were mapping to the 1000 Genomes Phase3 v5 by SHAPEIT 23,72

389 and then we conducted the genome-wide genotype imputation with 1000 Genomes

390 Phase3 v5 reference panel by Minimac321,22. Genetic variants with imputation

391 accuracy RSQR > 0.3 and MAF > 0.05 were included in our analysis. We used

392 Pan-Asian reference panel consist of 502 participants and SNP2HLA v1.0.3 to impute

393 HLA region 24-26.

394

395 Sequencing and processing of 16S rRNA data

396 Microbial DNA was extracted from fecal samples using the QIAamp® DNA Stool

397 Mini Kit per the manufacturer’s instruction. DNA concentrations were determined

398 using the Qubit quantification system. The V3-V4 region of the 16S rRNA gene was

399 amplified from genomic DNA using primers 341F and 805R. Sequencing was

400 performed using MiSeq Reagent Kits v2 on the Illuimina MiSeq System.

401

402 Fastq-files were demultiplexed by the MiSeq Controller Software. Ultra-fast sequence

403 analysis (USEARCH) was performed to trim the sequence for amplification primers,

404 diversity spacers, sequencing adapters, merge-paired and quality filter73. Operational

405 taxonomic units (OTUs) were clustered based on 97% similarity using UPARSE74.

406 OTUs were annotated with Greengenes 13_8

407 (https://greengenes.secondgenome.com/). After randomly selecting 10000 reads for

408 each sample, Quantitative Insights into Microbial Ecology (QIIME) software version

19 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

20

409 1.9.0 was used to calculate alpha diversity (Shannon diversity index, Chao1 diversity

410 indices and observed OTUs index) based on the rarefied OTU counts75.

411

412 Statistical analysis

413 Genome-wide association analysis of gut microbiome features

414 For each of the GNHS participants and the replication cohort, we clustered

415 participants based on genus-level relative abundance, estimating the JSD distance and

416 PAM clustering algorithm, and then defined two enterotypes according to

417 Calinski-Harabasz Index29,76. PLINK 1.9 was used to perform a logistic regression

418 model for enterotypes and taxa present at fewer than ninety percent, adjusted for age,

419 sex and the first five principal components.

420

421 For beta diversity, the analyses for the genome-wide host genetic variants with beta

422 diversity was performed using MicrobiomeGWAS31, adjusted for covariates including

423 the first five principal components, age and sex. We filtered OTUs present at fewer

424 than ten percent of participants to calculate Bray–Curtis dissimilarity.

425

426 Alpha diversity was calculated after randomly sampling 10000 reads per sample. For

427 the taxa present at more than ninety percent of participants and alpha diversity, we

428 used Z-score normalization to transform the distribution and carried out analysis

429 based on a log-normal model. A mixed linear model based association (MLMA) test

430 in GCTA was used to assess the association, fitting the first five principal components,

20 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

21

431 age, sex and sequencing batch as fixed effects and the effects of all the SNPs as

432 random effects 27,28,34. For other taxa present at fewer than ninety percent, we

433 transformed the absence/presence of the taxon into binary variables and used

434 PLINK1.9 to perform a logistic model, adjusted for the first five principal components,

435 age, sex and sequencing batch. For all the gut microbiome features, the significant

436 threshold was defined as 5×10-8 /n (n is the effective number of independent taxa on

437 each taxonomy level) in the discovery stage. We estimated genomic inflation factors

438 with LDSC v1.0.1 at local server 77.

439

440 Proportion of variance explained by all SNPs

441 We used the GREML method in GCTA to estimate the proportion of variance

442 explained by all SNPs 30. The taxa were divided into two groups based on whether the

443 taxa were present in the ninety percent of participants or not. For alpha diversity and

444 taxa, our model was adjusted for constrain covariates including sex and sequencing

445 batch, as well as quantitative covariates including the first five principal components

446 and age. The model was adjusted for the same covariates except for sequencing batch

447 for analysis of enterotype.

448

449 Genetic correlation of gut microbiome and traits

450 We used GCTA to perform a bivariate GREML analysis to estimate the genetic

451 correlation between gut microbiome and traits in the GNHS27,37. The gut microbiome

452 was divided into two groups according to the previous description. We used

21 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

22

453 continuous variables to taxa present at more than ninety percent of participants and

454 traits. For taxa present at fewer than ninety percent of participants, we used binary

455 variables according to the absence/presence of taxa. This analysis included traits such

456 as BMI, FBS, HbA1c, SBP, DBP, HDL-C, LDL-C, TC and TG.

457

458 Constructing polygenic scores for taxa and alpha diversity

459 We selected lead SNPs using PLINK v1.9 with the ‘—clump’ command to clump

460 SNPs that p value < 5×10-5 and r2<0.1 within 0.1 cM. We used beta coefficients as

461 weight to construct polygenic scores for taxa and alpha diversity. For alpha diversity

462 and taxa present at more than ninety percent participants, we constructed weighted

463 polygenic scores and performed the analysis on a general linear model with a negative

464 binomial distribution to test for association between the polygenic scores and taxa,

465 adjusted for the first five principal components, age, sex and sequencing batch. We

466 used weighted polygenic scores and logistic regression to the absence/presence taxa,

467 adjusted for the same covariates as in the above analysis. Taxa with significance

468 (p<0.05) in the replication cohort were included for the further analysis.

469

470 The effective number of independent taxa

471 As some taxa were correlated with each other, we used an eigendecomposition

472 analysis to calculate the effective number of independent taxa on each taxonomy level

473 35,36. Matrix M is an m×n matrix, where m is the number of participants and n is the

474 number of taxa on the corresponding taxa level. Matrix A is the variance-covariance

22 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

23

475 matrix of matrix M. P is the matrix of eigenvectors. is the diagonal

476 matrix comprised of the ordered eigenvalues, whichdiag can be calculated as:

477 The effective number of independentdiag taxa can be calculated as:

478

479 Bi-directional MR analysis

480 In the analysis of potential causal effect of gut microbiome features on diseases, we

481 used independent genetic variants (selected as part of the polygenic score analysis) as

482 the instrument variables. For each trait, we excluded instrument variables that showed

483 significant association with the trait (p<0.05/n, n is number of independent genetic

484 variants). In the analysis of potential causal effect of diseases on gut microbiome

485 features, we selected genetic variants that were replicated in East Asian populations as

486 instrument variables. As all instrument variables were from East Asian populations,

487 we chose independent genetic variants (r2<0.1) based on GNHS cohort. We identified

488 the best proxy (r2 > 0.8) based on GNHS cohort or discarded the variant if no proxy

489 was available. We used inverse variance weighted (IVW) method to estimate effect

490 size. To confirm the robustness of results, we performed other three MR methods

491 including weighted median, MR-Egger and MR-PRESSO. To assess the presence of

492 horizontal pleiotropy, we performed MR-PRESSO Global test and MR-Egger

493 Regression. Effect sizes of gut microbiome on traits were dependent on units of

494 traits43 (Supplementary table S1). Results of human complex diseases on the 23 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

24

495 absence/presence gut microbiome were presented as risk of presence (vs absence) of

496 the microbiome per log odds difference of the disease. Results of diseases on other

497 gut microbiome and alpha diversity were presented as changes in abundance of taxa

498 (10-SD of log transformed) per log odds difference of the respective disease.

499

500 The statistical significance of gut microbiome on traits and diseases was defined as

501 p<0.0008 (0.05/62). In addition, the statistical significance of diseases on gut

502 microbiome features was defined as p<0.05/n (n is the effective number of

503 independent taxa on the corresponding taxonomy level). Results that could not pass

504 Bonferroni adjustment but p<0.05 in all four MR methods were considered as

505 potential causal relationships. We performed MR analyses on R v3.5.3.

506

507 Pathway analysis

508 We used OTUs by QIIME and annotated the variation of functional genes with

509 Phylogenetic Investigation of Communities by Reconstruction of Unobserved States

510 (PICRUSt) 51. The pathways and diseases were annotated using KEGG 78-80. We used

511 Spearman's rank-order correlation to investigate association of predicted pathway or

512 diseases abundance with taxa. In the heatmap, diseases were clustered with ‘hcluster’

513 function on R. To test whether non-normalized pathway or disease abundance was

514 associated with each other, we used SPIEC-EASI to test the interaction relationship,

515 and then used Cytoscape v3.7.2 to visualize the interaction network 81,82.

516

24 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

25

517 Construction of the microbiome risk score

518 To construct a microbiome risk score for T2DM, we used a Spearman's rank-order

519 correlation to select taxa with the absolute value of correlation coefficient higher than

520 0.2. Score for each taxon abundance <5% quantile in our study was defined as 0. For

521 those above 5%, score for each taxon showing negative association with T2DM was

522 defined as 1; score for each taxon showing positive association with T2DM was

523 defined as -1. We then summed up values from all taxa. We selected logistic

524 regression model to estimate association of the MRS with T2DM risk, and linear

525 model to estimate the association of the MRS with the continuous variables, adjusted

526 for age, sex, dietary energy intake, alcohol intake and BMI at the time of sample

527 collection.

528

529 Clustering diseases

530 The clustering analysis was carried out with ‘cluster’ and ‘factoextra’ for plot on R.

531 We performed PAM algorithm based on predicted abundance of diseases or average

532 relative abundance after Z-score normalization 83. PAM algorithm searches k medoids

533 among the observations and then found nearest medoids to minimize the dissimilarity

534 among clusters 18. Given a set of objects , , the dissimilarity between

535 objects and is denoted by The assignment of object i to the

536 representative object j is denoted byd . is a binary variable and is 1 if object i

537 belongs to the cluster of the representative object j. The function to minimize the

538 model is given by:

25 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

26

539

540 To identify the optimal cluster number, we used ‘pamk’ function in R to determine the

541 optimum average silhouette width. For each object i, we defined as the average

542 dissimilarity between object i and all other objects within its cluster. For the

543 remaining clusters, represents the average dissimilarity between i and all

544 objects in cluster bThe h minimum dissimilarity can be calculated by:

545 w .

546 The silhouette width for object i can be h calculated h by:

h 547 Then we calculated the average of silhouettet width for each object. The cluster

548 number is determined by the number of which the average silhouette width is

549 maximum.

550

551 Acknowledgments

552 This study was funded by National Natural Science Foundation of China (81903316,

553 81773416), Westlake University (101396021801) and the 5010 Program for Clinical

554 Researches (2007032) of the Sun Yat-sen University (Guangzhou, China). The

555 authors declare no conflict of interest. We thank the Westlake University

556 Supercomputer Center for providing computing and data analysis service for the

557 present project.

558 26 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

27

559 Data availability

560 The raw data for 16 S rRNA gene sequences are available in the CNSA

561 (https://db.cngb.org/cnsa/) of CNGBdb at accession number CNP0000829.

562 563 564 565 References 566 1. Awany, D. et al. Host and Microbiome Genome-Wide Association Studies: 567 Current State and Challenges. Front Genet 9, 637 (2018). 568 2. Bull, M.J. & Plummer, N.T. Part 1: The Human Gut Microbiome in Health 569 and Disease. Integr Med (Encinitas) 13, 17-22 (2014). 570 3. Lynch, J.B. & Hsiao, E.Y. Microbiomes as sources of emergent host 571 phenotypes. Science 365, 1405 (2019). 572 4. Allegretti, J.R., Mullish, B.H., Kelly, C. & Fischer, M. The evolution of the 573 use of faecal microbiota transplantation and emerging therapeutic indications. 574 The Lancet 394, 420-431 (2019). 575 5. Wong, S.H. & Yu, J. Gut microbiota in colorectal cancer: mechanisms of 576 action and clinical applications. Nature Reviews Gastroenterology & 577 Hepatology (2019). 578 6. Davies, N.M., Holmes, M.V. & Davey Smith, G. Reading Mendelian 579 randomisation studies: a guide, glossary, and checklist for clinicians. BMJ 362, 580 k601 (2018). 581 7. Turpin, W. et al. Association of host genome with intestinal microbial 582 composition in a large healthy cohort. Nat Genet 48, 1413-1417 (2016). 583 8. Wang, J. et al. Genome-wide association analysis identifies variation in 584 vitamin D receptor and other host factors influencing the gut microbiota. Nat 585 Genet 48, 1396-1406 (2016). 586 9. Goodrich, J.K. et al. Human genetics shape the gut microbiome. Cell 159, 587 789-99 (2014). 588 10. Goodrich, J.K. et al. Genetic Determinants of the Gut Microbiome in UK 589 Twins. Cell Host Microbe 19, 731-43 (2016). 590 11. Ganesan, K., Chung, S.K., Vanamala, J. & Xu, B. Causal Relationship 591 between Diet-Induced Gut Microbiota Changes and Diabetes: A Novel 592 Strategy to Transplant Faecalibacterium prausnitzii in Preventing Diabetes. Int 593 J Mol Sci 19(2018). 594 12. He, Y. et al. Regional variation limits applications of healthy gut microbiome 595 reference ranges and disease models. Nat Med 24, 1532-1535 (2018). 596 13. Rothschild, D. et al. Environment dominates over host genetics in shaping 597 human gut microbiota. Nature 555, 210-215 (2018). 598 14. Duvallet, C., Gibbons, S.M., Gurry, T., Irizarry, R.A. & Alm, E.J. 599 Meta-analysis of gut microbiome studies identifies disease-specific and shared

27 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

28

600 responses. Nature communications 8, 1784-1784 (2017). 601 15. Cheng, S. et al. Identifying psychiatric disorder-associated gut microbiota 602 using microbiota-related gene set enrichment analysis. Briefings in 603 Bioinformatics (2019). 604 16. Jackson, M.A. et al. Gut microbiota associations with common diseases and 605 prescription medications in a population-based cohort. Nature 606 Communications 9, 2655 (2018). 607 17. Cao, Y. et al. Association of magnesium in serum and urine with carotid 608 intima-media thickness and serum lipids in middle-aged and elderly Chinese: a 609 community-based cross-sectional study. European journal of nutrition 610 55(2015). 611 18. Kaufman, L. & Rousseeuw, P. Partitioning Around Medoids (Program PAM). 612 68-125 (1990). 613 19. Sun, L.-L. et al. Associations between the dietary intake of antioxidant 614 nutrients and the risk of hip fracture in elderly Chinese: A case-control study. 615 The British journal of nutrition 112, 1-9 (2014). 616 20. Purcell, S. et al. PLINK: a tool set for whole-genome association and 617 population-based linkage analyses. American journal of human genetics 81, 618 559-575 (2007). 619 21. Das, S. et al. Next-generation genotype imputation service and methods. 620 Nature Genetics 48, 1284 (2016). 621 22. Clarke, L. et al. The international Genome sample resource (IGSR): A 622 worldwide collection of genome variation incorporating the 1000 Genomes 623 Project data. Nucleic Acids Research 45, D854-D859 (2016). 624 23. Delaneau, O. et al. Integrating sequence and array data to create an improved 625 1000 Genomes Project haplotype reference panel. Nature Communications 5, 626 3934 (2014). 627 24. Okada, Y. et al. Risk for ACPA-positive rheumatoid arthritis is driven by 628 shared HLA amino acid polymorphisms in Asian and European populations. 629 Hum Mol Genet 23, 6916-26 (2014). 630 25. Pillai, N.E. et al. Predicting HLA alleles from high-resolution SNP data in 631 three Southeast Asian populations. Hum Mol Genet 23, 4443-51 (2014). 632 26. Jia, X. et al. Imputing Amino Acid Polymorphisms in Human Leukocyte 633 Antigens. PLOS ONE 8, e64683 (2013). 634 27. Yang, J., Lee, S.H., Goddard, M.E. & Visscher, P.M. GCTA: a tool for 635 genome-wide complex trait analysis. Am J Hum Genet 88, 76-82 (2011). 636 28. Yang, J., Zaitlen, N.A., Goddard, M.E., Visscher, P.M. & Price, A.L. 637 Advantages and pitfalls in the application of mixed-model association 638 methods. Nat Genet 46, 100-6 (2014). 639 29. Arumugam, M. et al. Enterotypes of the human gut microbiome. Nature 473, 640 174-80 (2011). 641 30. Lee, S.H., Wray, N.R., Goddard, M.E. & Visscher, P.M. Estimating missing 642 heritability for disease from genome-wide association studies. Am J Hum 643 Genet 88, 294-305 (2011).

28 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

29

644 31. Hua, X. et al. MicrobiomeGWAS: a tool for identifying host genetic variants 645 associated with microbiome composition. bioRxiv, 031187 (2015). 646 32. Ruhlemann, M.C. et al. Application of the distance-based F test in an mGWAS 647 investigating beta diversity of intestinal microbiota identifies variants in 648 SLC9A8 (NHE8) and 3 other loci. Gut Microbes 9, 68-75 (2018). 649 33. Bonder, M.J. et al. The effect of host genetics on the gut microbiome. Nat 650 Genet 48, 1407-1412 (2016). 651 34. Yang, J. et al. Common SNPs explain a large proportion of the heritability for 652 human height. Nature Genetics 42, 565 (2010). 653 35. Bretherton, C.S., Widmann, M., Dymnikov, V.P., Wallace, J.M. & Bladé, I. 654 The Effective Number of Spatial Degrees of Freedom of a Time-Varying Field. 655 Journal of Climate 12, 1990-2009 (1999). 656 36. Wang, H. et al. Genotype-by-environment interactions inferred from genetic 657 effects on phenotypic variability in the UK Biobank. Science Advances 5, 658 eaaw3538 (2019). 659 37. Lee, S.H., Yang, J., Goddard, M.E., Visscher, P.M. & Wray, N.R. Estimation of 660 pleiotropy between complex diseases using single-nucleotide 661 polymorphism-derived genomic relationships and restricted maximum 662 likelihood. Bioinformatics 28, 2540-2 (2012). 663 38. Sanna, S. et al. Causal relationships among the gut microbiome, short-chain 664 fatty acids and metabolic diseases. Nature Genetics 51, 600-605 (2019). 665 39. Verbanck, M., Chen, C.-Y., Neale, B. & Do, R. Detection of widespread 666 horizontal pleiotropy in causal relationships inferred from Mendelian 667 randomization between complex traits and diseases. Nature Genetics 50, 668 693-698 (2018). 669 40. Low, S.K. et al. Identification of six new genetic loci associated with atrial 670 fibrillation in the Japanese population. Nat Genet 49, 953-958 (2017). 671 41. Suzuki, K. et al. Identification of 28 new susceptibility loci for type 2 diabetes 672 in the Japanese population. Nat Genet 51, 379-386 (2019). 673 42. Akiyama, M. et al. Genome-wide association study identifies 112 new loci for 674 body mass index in the Japanese population. Nat Genet 49, 1458-1467 (2017). 675 43. Kanai, M. et al. Genetic analysis of quantitative traits in the Japanese 676 population links cell types to complex human diseases. Nat Genet 50, 390-400 677 (2018). 678 44. Matoba, N. et al. GWAS of smoking behaviour in 165,436 Japanese people 679 reveals seven new loci and shared genetic architecture. Nat Hum Behav 3, 680 471-477 (2019). 681 45. Lu, X.F. et al. Genome-wide association study in Han Chinese identifies four 682 new susceptibility loci for coronary artery disease. Nature Genetics 44, 890-+ 683 (2012). 684 46. Marzec, J. et al. A genetic study and meta-analysis of the genetic 685 predisposition of prostate cancer in a Chinese population. Oncotarget 7, 686 21393-403 (2016). 687 47. Okada, Y. et al. Meta-analysis identifies multiple loci associated with kidney

29 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

30

688 function-related traits in east Asian populations. Nat Genet 44, 904-9 (2012). 689 48. Zeng, C. et al. Identification of Susceptibility Loci and Genes for Colorectal 690 Cancer Risk. Gastroenterology 150, 1633-1645 (2016). 691 49. Zhou, X. et al. Identification of genetic risk factors in the Chinese population 692 implicates a role of immune system in Alzheimer’s disease pathogenesis. 693 Proceedings of the National Academy of Sciences 115, 1697 (2018). 694 50. Gan, W. et al. Evaluation of type 2 diabetes genetic risk variants in Chinese 695 adults: findings from 93,000 individuals from the China Kadoorie Biobank. 696 Diabetologia 59, 1446-1457 (2016). 697 51. Langille, M.G.I. et al. Predictive functional profiling of microbial 698 communities using 16S rRNA marker gene sequences. Nature Biotechnology 699 31, 814 (2013). 700 52. Canga, Y. et al. Assessment of Atrial Conduction Times in Patients with 701 Newly Diagnosed Parkinson's Disease. Parkinsons Dis 2018, 2916905 (2018). 702 53. Ihara, M. & Washida, K. Linking Atrial Fibrillation with Alzheimer's Disease: 703 Epidemiological, Pathological, and Mechanistic Evidence. J Alzheimers Dis 704 62, 61-72 (2018). 705 54. Conen, D. et al. Risk of Malignant Cancer Among Women With New-Onset 706 Atrial FibrillationAtrial Fibrillation and Risk of CancerAtrial Fibrillation and 707 Risk of Cancer. JAMA Cardiology 1, 389-396 (2016). 708 55. Rousseeuw, P.J. Silhouettes: A graphical aid to the interpretation and 709 validation of cluster analysis. Journal of Computational and Applied 710 Mathematics 20, 53-65 (1987). 711 56. Hu, F.B. et al. Prospective Study of Cataract Extraction and Risk of Coronary 712 Heart Disease in Women. American Journal of Epidemiology 153, 875-881 713 (2001). 714 57. Javaheri, S. & Redline, S. Insomnia and Risk of Cardiovascular Disease. Chest 715 152, 435-444 (2017). 716 58. Strand, L.B. et al. Self-reported sleep duration and coronary heart disease 717 mortality: A large cohort study of 400,000 Taiwanese adults. International 718 Journal of Cardiology 207, 246-251 (2016). 719 59. Blekhman, R. et al. Host genetic variation impacts microbiome composition 720 across human body sites. Genome Biol 16, 191 (2015). 721 60. Kuehbacher, T. et al. Intestinal TM7 bacterial phylogenies in active 722 inflammatory bowel disease. Journal of Medical Microbiology 57, 1569-1576 723 (2008). 724 61. He, X. et al. Cultivation of a human-associated TM7 phylotype reveals a 725 reduced genome and epibiotic parasitic lifestyle. Proceedings of the National 726 Academy of Sciences of the United States of America 112, 244-249 (2015). 727 62. Bor, B., Bedree, J.K., Shi, W., McLean, J.S. & He, X. Saccharibacteria (TM7) 728 in the Human Oral Microbiome. Journal of Dental Research 98, 500-509 729 (2019). 730 63. Cirulli, E.T. & Goldstein, D.B. Uncovering the roles of rare variants in 731 common disease through whole-genome sequencing. Nat Rev Genet 11,

30 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

31

732 415-25 (2010). 733 64. Vinter, N., Christesen Amanda, M.S., Fenger‐Grøn, M., Tjønneland, A. & 734 Frost, L. Atrial Fibrillation and Risk of Cancer: A Danish Population‐Based 735 Cohort Study. Journal of the American Heart Association 7, e009543 (2018). 736 65. Zoja, C. et al. Imatinib ameliorates renal disease and survival in murine lupus 737 autoimmune disease. Kidney International 70, 97-103 (2006). 738 66. Boursi, B., Mamtani, R., Haynes, K. & Yang, Y.-X. Parkinson's disease and 739 colorectal cancer risk-A nested case control study. Cancer epidemiology 43, 740 9-14 (2016). 741 67. Xie, X., Luo, X. & Xie, M. Association between Parkinson's disease and risk 742 of colorectal cancer. Parkinsonism & Related Disorders 35, 42-47 (2017). 743 68. van Rheenen, W. et al. Genome-wide association analyses identify new risk 744 variants and the genetic architecture of amyotrophic lateral sclerosis. Nat 745 Genet 48, 1043-8 (2016). 746 69. Lambert, J.C. et al. Meta-analysis of 74,046 individuals identifies 11 new 747 susceptibility loci for Alzheimer's disease. Nat Genet 45, 1452-8 (2013). 748 70. Pankratz, N. et al. Meta-analysis of Parkinson's disease: identification of a 749 novel locus, RIT2. Annals of neurology 71, 370-384 (2012). 750 71. Anderson, C.A. et al. Data quality control in genetic case-control association 751 studies. Nat Protoc 5, 1564-73 (2010). 752 72. Delaneau, O., Marchini, J. & Zagury, J.-F. A linear complexity phasing method 753 for thousands of genomes. Nature Methods 9, 179 (2011). 754 73. Edgar, R.C. Search and clustering orders of magnitude faster than BLAST. 755 Bioinformatics 26, 2460-2461 (2010). 756 74. Edgar, R.C. UPARSE: highly accurate OTU sequences from microbial 757 amplicon reads. Nat Methods 10, 996-8 (2013). 758 75. Caporaso, J.G. et al. QIIME allows analysis of high-throughput community 759 sequencing data. Nat Methods 7, 335-6 (2010). 760 76. Caliński, T. & Harabasz, J. A dendrite method for cluster analysis. 761 Communications in Statistics 3, 1-27 (1974). 762 77. Bulik-Sullivan, B.K. et al. LD Score regression distinguishes confounding 763 from polygenicity in genome-wide association studies. Nature Genetics 47, 764 291-295 (2015). 765 78. Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes. 766 Nucleic Acids Res 28, 27-30 (2000). 767 79. Kanehisa, M., Sato, Y., Furumichi, M., Morishima, K. & Tanabe, M. New 768 approach for understanding genome variations in KEGG. Nucleic Acids Res 769 47, D590-D595 (2019). 770 80. Kanehisa, M. Toward understanding the origin and evolution of cellular 771 organisms. Protein Sci (2019). 772 81. Shannon, P. et al. Cytoscape: a software environment for integrated models of 773 biomolecular interaction networks. Genome research 13, 2498-2504 (2003). 774 82. Kurtz, Z.D. et al. Sparse and Compositionally Robust Inference of Microbial 775 Ecological Networks. PLOS Computational Biology 11, e1004226 (2015).

31 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

32

776 83. Reynolds, A.P., Richards, G., de la Iglesia, B. & Rayward-Smith, V.J. 777 Clustering Rules: A Comparison of Partitioning and Hierarchical Clustering 778 Algorithms. Journal of Mathematical Modelling and Algorithms 5, 475-504 779 (2006). 780

32 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

33

781 Figure legends 782 Figure 1 Association of host polygenic score with gut microbiome. The participants 783 were divided into high and low polygenic score group according to median levels of 784 the polygenic score. The dots on the right of the box represent the distribution of 785 polygenic score. The dash line in the box is the position of median line and the solid 786 line is the position of mean line. The length of box depends on upper quartile and 787 lower quartile of datum. Sample size at the discovery stage is 1475, and that at 788 replication stage is 199. (A). Correlation of Saccharibacteria abundance with the 789 polygenic score (including 51 lead SNPs, Supplementary Table S8). (B). Correlation 790 of Clostridiaceae abundance with the polygenic score (including 37 lead SNPs, 791 Supplementary Table S8). (C). Correlation of Klebsiella abundance with the 792 polygenic score (including 47 lead SNPs, Supplementary Table S8). (D). Correlation 793 of Comamonadaceae presence with the polygenic score (including 37 lead SNPs, 794 Supplementary Table S8). (E). Correlation of D168 species presence with the 795 polygenic score (including 34 lead SNPs, Supplementary Table S8). 796 797 Figure 2 Effect of host genetically predicted higher atrial fibrillation risk on gut 798 microbiome. (A). Causal association of atrial fibrillation with abundance of 799 Burkholderiales, Alcaligenaceae, Lachnobacterium and Coprophilus. The effect sizes 800 of atrial fibrillation on taxa are changes in abundance of (10-SD of 801 log-transformed) per genetically determined higher log odds of atrial fibrillation. (B). 802 Causal association of atrial fibrillation with presence of Barnesiellaceae, 803 Veillonellaceae_undefined and Mitsuokella. The effect size of atrial fibrillation on 804 taxa are present as odds ratio increase in log odds of atrial fibrillation. (C). The heat 805 map shows correlation of AF-associated taxa with predicted diseases. The grey 806 components show no significance of correlation with Bonferroni correction (p>0.05/ 807 (5.6*22), p>0.0004). 808 809 Figure 3 Association and cluster of diseases predicted by the gut microbiome. (A). 810 Plot of clusters in Guangzhou Nutrition and Health Study (GNHS) cohort (n=1919). 811 (B). Plot of cluster results in the replication cohort (n=217). (C). Plot of 5 clusters in 812 antibiotic-taking participants (n=18). The optimal cluster is 5 in GNHS cohort and 6 813 in the replication. The clusters share consistent components between two studies. In 814 contrast, components are different between antibiotic-taking participants and control 815 groups. Dimension1 (Dim1) and dimension2 (Dim2) can explain 40.1% and 13.1% 816 variance, respectively in GNHS cohort. The annotation for variables is as following. 817 AT: African trypanosomiasis, AD: Alzheimer's disease, V1: Amoebiasis, ALS: 818 Amyotrophic lateral sclerosis, BC: Bladder cancer, CD: Chagas disease, CML: 819 Chronic myeloid leukemia, CRC: Colorectal cancer, V2: Hepatitis C, HD: 820 Huntington's disease, HCM: Hypertrophic cardiomyopathy, V3: Influenza A, PD: 821 Parkinson's disease, V4: Pathways in cancer, V5: Prion disease, PCa: Prostate cancer, 822 RCC: Renal cell carcinoma, SLE: Systemic lupus erythematosus, V6: Tuberculosis, 823 T1DM: Type I diabetes mellitus, T2DM: Type II diabetes mellitus, V7: Vibrio

33 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

34

824 cholerae infection. (D). Plot of clusters in GNHS patients. Patients get only one of the 825 follow diseases: stroke (n=8), chronic hepatitis (n=19), coronary heart diseases (n=40), 826 cataract (n=124) and insomnia (n=68). (E). Gut microbiome-predicted network of 827 relationship among different human complex diseases. The interaction is determined 828 by SPIEC-EASI with non-normalized predicted abundance data. 829 830 Figure 4 Correlation of the human complex diseases with gut microbiome on 831 genus level. The heat map shows correlation of predicted diseases and gut 832 microbiome on genus level. The grey components show no significance of correlation 833 with Bonferroni correction (p>0.05/ (5.6*22), p>0.0004). 834

34 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

35

835 836 Figure 1 Association of host polygenic score with gut microbiome. The participants 837 were divided into high and low polygenic score group according to median levels of 838 the polygenic score. The dots on the right of the box represent the distribution of 839 polygenic score. The dash line in the box is the position of median line and the solid 840 line is the position of mean line. The length of box depends on upper quartile and 841 lower quartile of datum. Sample size at the discovery stage is 1475, and that at 842 replication stage is 199. (A). Correlation of Saccharibacteria abundance with the 843 polygenic score (including 51 lead SNPs, Supplementary Table S8). (B). Correlation 844 of Clostridiaceae abundance with the polygenic score (including 37 lead SNPs, 845 Supplementary Table S8). (C). Correlation of Klebsiella abundance with the 846 polygenic score (including 47 lead SNPs, Supplementary Table S8). (D). Correlation 847 of Comamonadaceae presence with the polygenic score (including 37 lead SNPs, 848 Supplementary Table S8). (E). Correlation of D168 species presence with the 849 polygenic score (including 34 lead SNPs, Supplementary Table S8). A B

35 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

36

850

36 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

37

851 Figure 2 Effect of host genetically predicted higher atrial fibrillation risk on gut 852 microbiome. (A). Causal association of atrial fibrillation with abundance of 853 Burkholderiales, Alcaligenaceae, Lachnobacterium and Coprophilus. The effect sizes 854 of atrial fibrillation on taxa are changes in abundance of bacteria (10-SD of 855 log-transformed) per genetically determined higher log odds of atrial fibrillation. (B). 856 Causal association of atrial fibrillation with presence of Barnesiellaceae, 857 Veillonellaceae_undefined and Mitsuokella. The effect size of atrial fibrillation on 858 taxa are present as odds ratio increase in log odds of atrial fibrillation. (C). The heat 859 map shows correlation of AF-associated taxa with predicted diseases. The grey 860 components show no significance of correlation with Bonferroni correction (p>0.05/ 861 (5.6*22), p>0.0004). 862 A B

C

863 864

37 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

38

865 Figure 3 Association and cluster of diseases predicted by the gut microbiome. (A). 866 Plot of clusters in Guangzhou Nutrition and Health Study (GNHS) cohort (n=1919). 867 (B). Plot of cluster results in the replication cohort (n=217). (C). Plot of 5 clusters in 868 antibiotic-taking participants (n=18). The optimal cluster is 5 in GNHS cohort and 6 869 in the replication. The clusters share consistent components between two studies. In 870 contrast, components are different between antibiotic-taking participants and control 871 groups. Dimension1 (Dim1) and dimension2 (Dim2) can explain 40.1% and 13.1% 872 variance, respectively in GNHS cohort. The annotation for variables is as following. 873 AT: African trypanosomiasis, AD: Alzheimer's disease, V1: Amoebiasis, ALS: 874 Amyotrophic lateral sclerosis, BC: Bladder cancer, CD: Chagas disease, CML: 875 Chronic myeloid leukemia, CRC: Colorectal cancer, V2: Hepatitis C, HD: 876 Huntington's disease, HCM: Hypertrophic cardiomyopathy, V3: Influenza A, PD: 877 Parkinson's disease, V4: Pathways in cancer, V5: Prion disease, PCa: Prostate cancer, 878 RCC: Renal cell carcinoma, SLE: Systemic lupus erythematosus, V6: Tuberculosis, 879 T1DM: Type I diabetes mellitus, T2DM: Type II diabetes mellitus, V7: Vibrio 880 cholerae infection. (D). Plot of clusters in GNHS patients. Patients get only one of the 881 follow diseases: stroke (n=8), chronic hepatitis (n=19), coronary heart diseases (n=40), 882 cataract (n=124) and insomnia (n=68). (E). Gut microbiome-predicted network of 883 relationship among different human complex diseases. The interaction is determined 884 by SPIEC-EASI with non-normalized predicted abundance data. 885 886 A B

C D

38 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

39

E

887 888

39 bioRxiv preprint doi: https://doi.org/10.1101/2019.12.26.888313; this version posted January 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

40

889 Figure 4 Correlation of the human complex diseases with gut microbiome. The 890 heat map shows correlation of predicted diseases and gut microbiome on genus level. 891 The grey components show no significance of correlation with Bonferroni correction 892 (p>0.05/ (5.6*22), p>0.0004).

893

40