medRxiv preprint doi: https://doi.org/10.1101/2020.01.30.20019752; this version posted February 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.

Oral, intestinal, and pancreatic microbiomes are correlated and exhibit co-

abundance in patients with pancreatic cancer and other gastrointestinal

diseases

Mei Chung1, Naisi Zhao1, Richard Meier2, Devin C. Koestler2,3, Erika Del Castillo,4 Guojun Wu5,

Bruce J. Paster4,6, Kevin Charpentier7, Jacques Izard8,9, Karl T. Kelsey10, and Dominique S.

Michaud1*

Affiliations:

1 Department of Public Health and Community Medicine, School of Medicine, Tufts University,

Boston, MA, USA

2 Department of Biostatistics, University of Kansas Medical Center, Kansas City, KS, USA

3 University of Kansas Cancer Center, The University of Kansas Medical Center, Kansas City,

KS, USA.

4 The Forsyth Institute, Cambridge, MA, USA.

5 Department of Biochemistry and Microbiology, Center for Nutrition, Microbiome and Health,

New Jersey Institute for Food, Nutrition and Health, Rutgers University, New Brunswick, NJ,

USA.

6 Harvard School of Dental Medicine, Boston, MA, USA.

7 Department of Surgery, Rhode Island Hospital, Providence, Rhode Island.

8 Department of Food Science and Technology, University of Nebraska, Lincoln, NE, USA

9 Fred and Pamela Buffet Cancer Center, University of Nebraska Medical Center, Omaha, NE,

USA

10 Center for Environmental Health and Technology, Brown University, Providence, RI, USA

NOTE: This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice. 1 medRxiv preprint doi: https://doi.org/10.1101/2020.01.30.20019752; this version posted February 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.

*Correspondence: Dominique S. Michaud, ScD Tufts University School of Medicine 136 Harrison Avenue Boston, MA 02111 Phone: (617)-636-0482 Email: [email protected]

2 medRxiv preprint doi: https://doi.org/10.1101/2020.01.30.20019752; this version posted February 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.

Abstract

Oral microbiota are believed to play important roles in systemic diseases, including cancer. We

collected oral swabs and at least one pancreatic tissue or intestinal samples from 52 subjects, and

characterized 16S ribosomal RNA genes using high-throughput DNA sequencing in a total 324

samples. We identified a total of 73 unique Amplicon Sequence Variants (ASVs) that were

shared between oral and pancreatic or intestinal samples. Accounting for pairing and within-

subject correlation, 7 ASVs showed significant concordance (Kappa statistics) and 5 ASVs

exhibited significant or marginally significant Pairwise Stratified Association (PASTA) between

oral samples and pancreatic tissue or intestinal samples. Of these, two specific bacterial species

( morbillorum and Fusobacterium nucleatum subsp. vincentii) showed consistent

presence or absence patterns between oral and intestinal or pancreatic samples. Lastly, our

microbial co-abundance analyses showed several distinct ASVs clusters and complex

correlation-networks between ASV clusters in buccal, saliva, duodenum, jejunum, and pancreatic

tumor samples. Toghether, our findings suggest that oral, intestinal, and pancreatic microbiomes

are correlated and of oral origin exhibit co-abundance relationships and demonstrate

complex correlation patterns in the intestinal and pancreatic tumor samples. Future prospective

studies should aim to uncover the co-abundance of specific microbial communities for studying

etiology of microbiota-driven carcinogenesis.

Keywords: Oral microbiomes, intestinal microbiomes, gastrointestinal diseases, pancreatic

cancer

3 medRxiv preprint doi: https://doi.org/10.1101/2020.01.30.20019752; this version posted February 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.

Introduction

The oral cavity is a major gateway to the human body. It is estimated that the oral cavity

collectively harbors over 700 predominant bacterial species 1. Oral microbes have been shown to

contribute to a number of oral diseases, including tooth caries, periodontitis, endodontic infection,

alveolar osteitis, and tonsillitis. It is hypothesized that oral opportunistic or pathogenic bacteria

can enter into the blood circulation, passing through the oral mucosal barrier, potentially

resulting in abnormal local and systemic immune and metabolic responses 2. Oral microbiota are

believed to play important roles in systemic diseases such as cardiovascular diseases, diabetes

mellitus, respiratory diseases, and cancer 3-6.

Many studies have investigated the relationship between the oral or gut microbiome and

various cancer risks using different methods and study designs 7,8. Among these, colorectal

cancer (CRC) is the most studied cancer. The unexpected finding that species of Fusobacterum,

particularly the oral species Fusobacterium nucleatum, are very prevalent (about 30%) in CRC

cases suggested an association between the oral microbiota in the colon and CRC 8. Research on

the relationships between oral bacteria and pancreatic cancer risk stems from a number of

observational studies that have reported a higher risk of pancreatic cancer among individuals

with periodontitis, when compared to those without periodontitis 9-11. A number of studies have

examined the association of the oral microbiome with pancreatic cancer risk 12-15, but results

were inconsistent partially due to differences in methods and study designs. Although one recent

study found suggestive evidence that oral dysbiosis is a causative effect of early pancreatic

cancer 12, more prospective studies are needed to replicate and confirm their findings. Defining

oral microbial profiles as non-invasive biomarkers for pancreatic cancer could help screening of

high-risk populations.

4 medRxiv preprint doi: https://doi.org/10.1101/2020.01.30.20019752; this version posted February 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.

In an effort to address the specific question of whether the pancreas has its own

microbiome, we recruited subjects with planned foregut surgery to obtain pancreatic tissue

samples for 16S rRNA gene microbiome analysis. We reported that bacterial taxa known to

inhabit the oral cavity, including several putative periodontal pathogens, were common in the

pancreas microbiome 16. Moreover, bacterial DNA profiles in the pancreas were similar to those

in the duodenum tissue of the same subjects, regardless of disease state, suggesting that bacteria

may disseminate from the gut into the pancreas. Thereore, the present study explores the oral

microbiome in these same patients, making it a critical first step to understanding whether oral

microbiota may be used as non-invasive biomarkers for monitoring the process of pancreatic

carcinogenesis or disease progression. To our knowledge, no prior study to date has

characterized the overall microbiome at multiple sites in the oral cavity and their correlations

with the microbiome in pancreatic tissue and intestinal tissue or surfaces.

Results

Population and sample characteristics

The present analysis included 52 subjects (Table 1). These subjects were between 31 and

86 years old, and contributed a total 324 samples (52 tongue swab, 54 buccal swab, 35

supragingival swab, 48 saliva swab, 22 duodenum tissue, 34 jejunum swab, and 19 bile duct

swab samples, as well as 21 pancreatic duct, 6 normal pancreatic tissue and 33 pancreatic tumor

samples). Based on the pathology records, ICD10 codes were assigned to each subject: 24

subjects had pancreatic cancer (ICD10 codes C25.0-C25.9), 8 subjects had periampullary

adenocarcinoma (ICD10 codes C24.1) and 4 subject had extrahepatic cholangiocarcinoma

(C24.0), 11 subjects had other pancreatic conditions (ICD10 codes K86.0-K86.3), and the

5 medRxiv preprint doi: https://doi.org/10.1101/2020.01.30.20019752; this version posted February 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.

remaining 5 had other gastrointestinal conditions. Sixty-two percent of these subjects were ever

smokers and most received antibiotics days prior to surgery (after providing oral specimens).

A total of 4077 unique Amplicon Sequence Variants (ASVs) were identified across oral

swab samples, and 4304 ASVs were identified across pancreatic tissue or intestinal tissue and

swab samples via DATA2. ASVs are also referred as “features” in QIIME2 processing, and can

be roughly understood as unique identifiers reaching up to the levels of bacterial strains or a

group of highly similar strains.

Microbiome communities at different sites

Based on Shannon index (alpha-diversity measure) and Bray-Curtis PCoA plot (beta-

diversity measure), bacterial communities from tongue and saliva samples appear more similar to

each other compared to that from buccal and supragingival samples (Supplemental material 1).

PERMANOVA tests showed that saliva samples were significantly different from tongue

(p=0.005) and from supragingival (p=0.0001) but not different from buccal samples (p=0.02)

with regards to beta-diversity measures. Furthermore, buccal samples were significantly different

from tongue (p=0.006) but not different from supragingival samples (p=0.387), and tongue were

significantly different from supragingival samples (p=0.0001).

Bacterial communities in pancreatic tissue and intestinal tissue or surfaces have been

previously described 16. Briefly, alpha and beta-diversity analyses did not show any visually

apparent clustering by sites (i.e., duodenum tissue, jejunum swab, bile duct swab, pancreatic duct,

and pancreatic tissue samples). Similarly, the PERMANOVA test results were not statistically

significant (no significant differences between sites).

Shared ASVs and

6 medRxiv preprint doi: https://doi.org/10.1101/2020.01.30.20019752; this version posted February 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.

A total of 73 ASVs were common between oral (any site) and intestinal or pancreatic

samples in at least one subject (Figure 1; Supplemental table 1). Taxonomic annotations of

these shared ASV indicate that , Veillonella, Prevotella, Fusobacterium, Gemella,

Haemophilus, and Rothia are top 7 frequently shared genera (in descending order) between oral

and gut or pancreatic samples. The top 7 most frequently shared species include: Veillonella

parvula, Streptococcus parasanguinis clade 411, Fusobacterium nucleatum subsp. vincentii,

Gemella morbillorum, Haemophilus parainfluenzae, Prevotella veroralis, and Rothia aeria.

Concordance (Kappa) and Pairwise Stratified Association (PASTA)

For both Kappa and PASTA tests, a total of 53 ASVs for which less than 95% of samples

exhibited a relative abundance of zero were tested. Based on the Kappa statistics, seven ASVs

showed significant concordance with regards to the probability of presence or absence between

oral and pancreatic tissue or intestinal samples after adjusting for multiple testing. The taxonomy

of these ASVs (same assigned taxa in all reference databases unless otherwise noted) are:

Oribacterium parvum, Fusobacterium nucleatum subsp. vincentii, Rothia mucilaginosa (GG),

Gemella morbillorum, Rothia aeria (HOMD)/mucilaginosa (GG), Streptococcus parasanguinis

clade 411 (HOMD)/anginosus (GG), Prevotella veroralis (HOMD)/melaninogenica (GG).

However, the estimated probability to be present in both oral and pancreatic tissue or intestinal

samples was low (ranging from 2% to 16%) for these seven ASVs (Table 2).

The PASTA test identified two ASVs (ASV13 and ASV21), Gemella morbillorum and

Streptococcus, that exhibited a significant and a marginally significant association respectively

with regards to the probabilities of absence (p) between oral and pancreatic tissue or intestinal

samples across disease types, when disease status was coded into four groups. For both ASVs,

the probabilities of absence tended to be highest among “C24” subjects, second highest among

7 medRxiv preprint doi: https://doi.org/10.1101/2020.01.30.20019752; this version posted February 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.

“C25” subjects and lowest among subjects assigned to the “other” or K86.2 group (Figure 2a).

The ASV corresponding to Gemella morbillorum also showed a marginally significant

association (PN<0.1) for the mean relative abundance (μ). When disease status was coded into

three groups (vs 4 groups), similar findings were shown for the two ASVs in the above analysis.

Furthermore, three additional ASVs also showed significant (ASV28: Fusobacterium nucleatum

subsp. vincentii) or marginally significant (ASV67: Veillonella parvula/dispar; ASV19:

Streptococcus parasanguinis clade 411) associations between oral and pancreatic tissue or

intestinal samples with respect to p (Figure 2b). None of the ASVs tested showed significant

associations with regards to the non-zero mean relative abundance (ω). Detailed PASTA test

results of these five ASVs are shown in Table 3.

Patterns of co-abundance between ASVs

A co-abundance network diagram and clustering tree were graphed for the prevalent

ASVs in the saliva (Figure 3), buccal (Figure 4), duodenum (Figure 5), jejunum (Figure 6),

and pancreatic tumor samples (Figure 7), repectively. Only ASVs with an absolute SparCC

correlation value greater than 0.1 and a p-value less than 0.05, were plotted. Although ASVs

belonging to a co-abundance group does not necessarily imply mutual relationships between

them, ASVs in the same co-abundance cluster are likely to respond to environment changes in

the same fashion 17. In the co-abundance network diagrams, ASVs are colored according to the

co-abundance clustering group that they belong to, while their relative positions, ie., being close

together or being opposite end of diagram, represent ASVs’ abundance being directly or

inversely correlated, respectively.

The Ward clustering algorithm identified 6 and 5 co-abundance clusters in saliva and

buccal samples, respectively (Figures 3b and 4b). In buccal samples (Figure 4b), the cluster

8 medRxiv preprint doi: https://doi.org/10.1101/2020.01.30.20019752; this version posted February 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.

dominated by Veillonella parvula (Red and Green) was inversely correlated with the co-

abundance cluster that mostly contained Fusobacterium nucleatum subsp. vincentii,

Haemophilus parainfluenzae, and Capnocytophaga gingivalis (Orange). In saliva samples

(Figure 3b), the same Capnocytophaga gingivalis ASV, that was also in buccal samples, was

clustered in a co-abundance group with many Fusobacterium nucleatum subsp. vincentii and

Streptococcus ASVs (Grey). Moverover, the co-abundance cluster that contained only

Fusobacterium, Neisseria|, and Haemophilus parainfluenzae (Green) was more positively

correlated with a cluster dominated by Haemophilus ASVs and Streptococcus ASVs (Yellow),

and more inverserly correlated with a cluster of Veillonella parvula ASVs and Prevotella

veroralis ASVs (Red). The five ASVs identified by the PASTA tests belonged to the grey cluster

(ASV28, ASV21, ASV13, and ASV19) and green cluster (ASV67) in buccal samples. They

belonged to the yellow cluster (ASV21 and ASV19), pink cluster (ASV 13 and ASV 67), and

grey cluster (ASV28) in saliva samples.

There were 3, 8 and 4 co-abundance clusters in the duodenum tissue, jejunum swab, and

pancreatic tumor tissue samples, respectively (Figures 5b, 6b, and 7b). All five ASVs identified

by the PASTA tests were shown in the jejunum swab clusters – three ASVs (ASV13, ASV28,

ASV67) belonged to the purple cluster, one (ASV19) belonged to the yellow cluster, and one

(ASV21) belonged to the pink cluster. One duodenum cluster was dominated by Fusobacterium

nucleatum subsp. vincentii ASVs and Klebsiella pneumoniae ASVs (Red), and was inversely

correlated to a cluster (Green) that contained Ralstonia, Bacteroides, Lachnospiraceae, Sacchar

ibacteria, Brevundimonas, and Sphingomonas ASVs. In the pancreatic tumor samples, one

cluster was dominated by Fusobacterium nucleatum subsp. vincentii, Klebsiella pneumoniae,

Campylobacter rectus, Dialister pneumosintes, Prevotella nigrescens, and Parvimonas micra

9 medRxiv preprint doi: https://doi.org/10.1101/2020.01.30.20019752; this version posted February 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.

ASVs (Red), which contains bacteria that were identified as “the orange complex” – the

transitional population between health and severe periodontal disease 18. Only two (ASV13 and

ASV67) of the five ASVs identified by the PASTA tests were shown in the clusters of ASVs in

duodenum samples. Both (Gemella morbillorum and Veillonella parvula) belonged to the grey

cluster. Two ASVs (ASV13: Gemella morbillorum and ASV67: Veillonella| arvula) consistently

belonged to the same cluster in saliva, duodenum tissue, and jejunum swab samples. None of the

five ASVs were plotted in pancreatic tumor samples’ ASV clusters (due to low or insignificant

correlations with other ASVs) although two (ASV13 and ASV28) were present in the pancreatic

tumor samples (Supplemental table 1).

Discussions

In this paper, we characterized oral microbiome communities at four oral sites (tongue,

buccal, supragingival, and saliva) and examined their correlations with the microbiome in the

pancreatic tissue or intestinal samples using several complementary analyses. We identified a

large number of ASVs that were shared between oral and pancreatic or intestinal samples among

patients with pancreatic cancer and other gastrointestinal diseases. Tongue and saliva samples

were more similar to each other with regards to the alpha and beta diversity measures compared

to the other two oral sites. This results expand the original finding of the Human Microbiome

Project of oral biogeography in healthy subjects 19. More importantly, it underlines that

individuals with cancer and gastrointestinal diseases maintain such biogeography, offering the

option to limit the numbers of oral samples to be collected in future study, in particular

longitudinal prospective studies. When comparing oral to intestinal or pancreatic samples, seven

ASVs showed significant concordance (Kappa statistics) and five ASVs exhibited significant or

10 medRxiv preprint doi: https://doi.org/10.1101/2020.01.30.20019752; this version posted February 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.

marginally significant associations (PASTA) with regards to the probabilities of absence or

precence. Lastly, our microbial co-abundance analyses showed several distinct ASVs clusters

and complex correlation-networks between ASV clusters in buccal, saliva, duodenum, and

pancreatic tumor samples. Analyzing multiple samples within individuals, the present study is

the first to show correlations between oral and pancreatic or intestinal microbiome among

patients with gastrointestinal diseases or cancer. We believe our study results provide critical

insights for future prospective studies to uncover and define the co-abundance of specific oral

microbiome communities as non-invasive biomarkers for monitoring the process of pancreatic

carcinogenesis or disease progressions.

Emerging research has focused on profiling oral microbiome as potential biomarkers for

disease phenotypes in epidemiological studies 20 because they are noninvasive, and oral

microbiome profiles have shown relative intraindividual stability over time and clear

interindividual differences 21,22. Although many observational studies have shown that specific

oral microbiota in oral cavities and in the fecal samples are associated with oral, head and neck,

lung, colorectal, and pancreatic cancer risks 7,8,23, data on the associations between oral and

tissue microbiome profiles are very limited. To our knowledge, there is only one prior study

examined the microbial profiles that were shared between the oral cavity and tissue samples 24.

Their analyses focused on 17 OTUs that were detected in 37% of both tissue samples (CRC and

polyp) and oral swabs, and identified two tumor-associated bacterial co-abundance clusters.

Specifically, one cluster is comprised of oral pathogens previously linked with late colonization

of oral biofilms and with human diseases including CRC (e.g., Fusobacterum nucleatum,

Parvimonas micra, Peptostreptococcus stomatis, and Dialister pneumosintes), and the other

cluster is comprised of dominant bacteria in early dental biofilm formation including genera

11 medRxiv preprint doi: https://doi.org/10.1101/2020.01.30.20019752; this version posted February 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.

Actinomyces, Haemophilus, Rothia, Streptococcus, and Veilonella 24. A letter to the editor

reported that Fusobacteruim nucleatum was detected in 8 (57.1%) of 14 CRC patients’ tumor

and saliva samples, and identical strains were found in 75% of the tumor and saliva samples

suggesting that F. nucleatum in colorectal tumor originates in the oral cavity 25. In the present

study, we identified 73 ASVs shared between oral and intestinal or pancreatic (Figure 1;

Supplemental table 1). Only F. nucleatum was identified from the 1st cluster of the work of

Flemmer et al. 24, however, all except Actinomyces was found from the second cluster in our

study. We found that F. nucleatum was among the top 3 most frequently shared species based on

the taxonomic annotations of the shared ASVs between oral and intestinal or pancreatic samples.

Our co-abundance analyses identified multiple clusters of ASVs in buccal, saliva,

duodenum, jejunum swab, and pancreatic tumor samples. In buccal and saliva samples,

Capnocytophaga gingivalis ASV was consistently found in the same co-abundance group with

Fusobacterium nucleatum subsp. vincentii ASVs. Of the 4 clusters of ASVs in pancreatic tumor

samples, one cluster with largest number of ASVs comprised mostly the dominant bacteria in

early dental biofilm formation or the genera also associated with relatively healthy tooth pockets

18, and the other three clusters each contain bacterial species that are associated with cancer risks

such as Gemella morbillorum and Fusobacterium nucleatum subsp. vincentii, as well as species

that are associated with periodontal disease or infections such as Prevotella nigrescens,

Campylobacter rectus, and Klebsiella pneumoniae 26,27.

After adjusting for disease status, our PASTA analyses identified two specific bacterial

species (Gemella morbillorum and Fusobacterium nucleatum subsp. vincentii) that showed

consistent presence or absence patterns between oral and intestinal or pancreatic samples, which

warrant further investigations on their associations to the pancreatic cancer risk or progression. A

12 medRxiv preprint doi: https://doi.org/10.1101/2020.01.30.20019752; this version posted February 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.

few studies had shown associations between these bacterial species and CRC risk. A large

retrospective study found that the risk of CRC was significantly increased in patients with

culture-comfirmed bacteremia (presence of bacteria in blood) from Gemella morbillorum,

previously known as as Streptococcus morbillorum (HR = 15.2; 95% CI = 1.54–150), and from

Fusobacterium nucleatum (HR = 6.89; 95% CI = 1.70–27.9) 28. A case-control study in a cohort

of individuals undergoing screening colonoscopy found that both Gemella morbillorum and F.

nucleatum (part of 21 species-level OTUs) were enriched in fecal samples from CRC patients

compared to controls 23. This study also showed strong co-abundance relationships between

Parvimonas micra, F. nucleatum, and Solobacterium moorei. Another study analyzed the

association of Fusobacterium species in pancreatic tumor with patient prognosis and showed

significantly higher mortality (poorer prognosis) among pancreatic cancer patients with

Fusobacterium species-positive tumors than those with Fusobacterium species-negative tumors

(HR = 2.16; 95% CI = 1.12–3.91) 29.

A growing number of studies have investigated the relationship between the oral

microbiome and pancreatic cancer risk using different methods and study designs, but the results

have been inconsistent and most studies had small sample sizes 12-15. The earliest, published in

2012, is a case-control study examining the oral microbiota of individuals diagnosed with either

pancreatic cancer or a matched health control 13. This study showed an increase in 31 and a

decrease in 25 bacterial species/clusters in the saliva of pancreatic cancer patients compared with

healthy controls; two oral bacterial candidates (Neisseria elongata and Streptococcus mitis) were

validated in an independent sample population with the finding of significant reductions (P <

0.05; qPCR) in pancreatic cancer subjects compared to controls. A clinic-based case-control

study found that pancreatic cancer patients had higher proportions of the genus Leptotrichia and

13 medRxiv preprint doi: https://doi.org/10.1101/2020.01.30.20019752; this version posted February 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.

lower proportions of Porphyromonas and Neisseria in saliva, as compared to healthy controls or

those with other diseases (P < 0.001). However, this report found that the relative abundances of

previously identified bacterial biomarkers (e.g., Streptococcus mitis and Granulicatella adiacens)

were not significantly different in the saliva of pancreatic cancer patients 15. The largest study to

date of the oral microbiome and pancreatic cancer capitalized on data collected on patients in 2

large prospective cohort studies to conduct a nested case-control study 12. The results showed

that presence (vs absence) of P. gingivalis and Aggregatibacter actinomycetemcomitans in saliva

collected prior to cancer diagnosis was associated with 60% and 120% increase in risk of

pancreatic cancer, respectively. In contrast, the presence (vs absence) of Leptotrichia in saliva

was associated with 13% decreased risk of pancreatic cancer 12. The most recently published

study evaluated the characteristics of the oral microbiota in patients with pancreatic ductal

adenocarcinoma (PDAC), intraductal papillary mucinous neoplasms (IPMNs), and healthy

controls 14. This case-control study found no differences between patients with PDAC and

healthy controls, or between patients with PDAC and those with IPMNs, on measures of alpha

diversity of the oral microbiota. PDAC patients had higher levels of and several

related taxa (, Lactobacillales, Streptococcaceae, Streptococcus, Streptococcus

thermophilus), although, after adjustment for multiple testing, results remained significant at the

phylum level only.

Our study has several strengths. We used ASVs from Illumina-scale amplicon data in all

analyses without imposing the arbitrary dissimilarity thresholds that define molecular OTUs.

ASVs capture all biological variation present in the data, and ASVs are consistently labelled with

intrinsic biological meaning as a DNA sequence 30. Thus, our study results can be reliably

reproduced and validated in future studies. However, our study also had some limitations; our

14 medRxiv preprint doi: https://doi.org/10.1101/2020.01.30.20019752; this version posted February 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.

sample size was small and PASTA tests were likely underpowered. Moreover, we did not have

samples from healthy controls and could not compare our findings with prior studies due to

differences in methods.

Conclusions

Taken together, the results of the present study suggest that oral, intestinal, and pancreatic

microbiomes are correlated, and bacteria of oral origin exhibit co-abundance relationships and

demonstrate complex correlation patterns in the intestinal and pancreatic tumor samples. These

findings provide critical insights on microbial communities and species that are common in oral

cavity, intestinal or pancreatic tissue samples among patients cancer and other gastrointestinal

diseases. Though, due to cross-sectional study design, we are unable to make any conclusion

regarding their roles in disease progression or whether these bacterial species were colonizing or

just passing through various body sites. Our findings should be validated in independent and

adequately sized populations with appropriate controls. Moreover, growing evidence have shown

that bacterial species may survive, decline, and adapt as interdependent functional groups

responding to environmental changes 17,31,32. Future studies should aim to uncover the co-

abundance of specific microbial communities for studying etiology of microbiota-driven

carcinogenesis in prospective and longitudinal studies.

Methods

Sampling, DNA extraction and 16S rRNA gene amplicon sequencing

Subjects of the present study were a subset of subjects who underwent surgery for

pancreatic diseases or diseases of the foregut, at the Rhode Island Hospital (RIH) between 2014

15 medRxiv preprint doi: https://doi.org/10.1101/2020.01.30.20019752; this version posted February 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.

and 2016. Details have been described in our previous publication 16. Briefly, data on

participants’ demographics and behavioral factors were collected using a self-administered

questionnaire, and pancreatic tissue samples and gastrointestinal swabs were collected during

surgery using DNA-free forensic sterile swabs whenever possible to reduce contamination. Oral

swabs were collected from participants prior to surgery using sterile cytology brushes which

were immediately placed in tubes containing 700 ul RNA later solution after collection. Saliva

was collected using saliva kits (OMNIgene OM-501, DNA Genotek). All samples were de-

identified and stored at -80ºC until processing for DNA extraction. Hypervariable regions of the

16S rRNA gene were sequenced using primers targeting the V3-V4 as paired-end reads on an

Illumina platform. Detailes for DNA extraction and sequencing procedures are provided in our

previous publication 16. Of the 77 enrolled subjects, 52 subjects with both oral swab samples

(tongue, buccal mucosa, supragingival, or saliva) and at least one pancreatic tissue (pancreatic

duct, normal or tumor pancreas) or intestinal (duodenum tissue, jejunum swab, bile duct swab)

sample, were included in the analyses. Samples of the duodenal stent, stomach swab, ileum

swab, and pancreatic swab were exluded due to the small number of samples available.

Sequence quality checking and denoising were performed using DADA2 Illumina

sequence denoising process 33. Human-associated DNA contaminants were screened out using

Bowtie2 34. Taxonomic classification, alignment, and phylogenetic tree building were completed

using the Quantitative Insights Into Microbial Ecology version 2 (QIIME2) 35,36. The sequences

of each sample were rarefied to 1200 to even the difference in sequencing depth across both oral

and intestinal samples for further analysis. The choice of 1200 as sampling depth was guided by

reviewing an alpha rarefaction curve that tested various depths ranging between 500 and 5,000

(Supplementary Material 2).

16 medRxiv preprint doi: https://doi.org/10.1101/2020.01.30.20019752; this version posted February 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.

Assigning taxonomic annotation

To predict the taxonomic groups that are present in each sample, QIIME2 plugin (q2-

feature-classifier) was used to train naïve Bayes classifiers using multiple databases as different

set of reference sequences. These were the Human Oral Microbiome Database (HOMD) (version

15.1), the Silva (release 132), and the Greengenes (13_8 revision) 99% OTUs (Operational

taxanomic units) 16S rRNA gene databases, all trimmed to contain the V3-V4 hypervariable

region. HOMD identification was chosen as the default taxonomy. Whenever HOMD, Silva, and

Greengenes yielded different taxonomic results at the species level, taxonomic information from

all datasets were kept and reported.

Statistical analyses

Analyses were carried out in QIIME2 (https://qiime2.org), in R 37, and in Python. All

analyses were conducted using ASV as the unit of observation in order to accurately capture

bacterial strain level variations.

Alpha & Beta Diversity

For the calculation of alpha and beta diversity measures of oral microbiome, a phylogenetic

tree was first created in order to generate phylogenetic diversity measures such as Faith’s

Phylogenetic Diversity, unweighted and weighted UniFrac distances 38,39. Creating a

phylogenetic tree requires multiple sequence alignment, masking, tree building, and rooting. The

masking step removes alignment positions that do not contain enough conservation to provide

meaningful information (default 40%). Next, evenness and diversity of oral microbiota in each

sample was assessed to examine the variation in the microbial profile across different oral

sampling sites. Pielou’s Evenness test and Faith’s Phylogenetic Diversity test were calculated

using QIIME2 diversity analyses (q2diversity) plugin 40. Computed distances were then used to

17 medRxiv preprint doi: https://doi.org/10.1101/2020.01.30.20019752; this version posted February 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.

generate principal coordinate analysis (PCoA) plots to visualize the arrangement of the samples

in the ordination space. PERMANOVA tests 41 were conducted to compare beta-diversity

measures between sampling sites. Bonferroni correction was used to adjusted for multiple testing,

and thus a p-value less than 0.008 was considered significant differene in beta-diversity measures

between oral sites.

Identifying Shared ASVs

Descriptive analyses were performed to identify shared ASVs between sites. Rarefied

features with a relative abundance less than or equal to 0.01 (≤1%) were set to zero. Shared

ASVs between any one of the oral sites (tongue, buccal mucosa, supragingival, or saliva) and

any pancreatic tissue or intestinal sites (duodenum tissue, jejunum swab, bile duct swab) for each

subject were identified. A heatmap of the ASVs that were shared by one or more subjects was

generated in R using the packages intersect and ggplot2.

Concordance and Pairwise Stratified Association

Associations of ASVs between oral and pancreatic tissue or intestinal site samples were

investigated using two different types of statistical tests that account for pairing and within-

subject correlation. For both tests, rarefied features with a relative abundance less than or equal

to 0.01 (≤1%) were set to zero (i.e., set to absence), and only ASVs for which less than 95% of

samples exhibited a relative abundance of zero were tested.

The first test (Kappa) evaluated general concordance in presence or absence of a given ASV

in each subject. Specifically, for each subject, samples were divided into two groups: 1) oral

samples, and 2) pancreatic tissue or intestinal site samples. The target ASV was denoted as

present in a group if at least one of the samples had a relative abundance value larger than zero,

or denoted as absent if all relative abundance values in the group were zero. This reformatted

18 medRxiv preprint doi: https://doi.org/10.1101/2020.01.30.20019752; this version posted February 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.

data thus resulted in two values for each subject: 1) whether or not the target ASV was present

across oral samples and 2) whether or not the target ASV was present across pancreatic tissue or

intestinal sites for that subject. Based on these values, Cohen’s Kappa concordance statistic was

computed 42 utilizing the R package “irr”, and a one-sided test based on Kappa’s large sample

standard deviation was conducted to examine whether there is significant agreement in presence

or absence between oral and pancreatic tissue or intestinal site samples for a given ASV.

Additionally, a test for Pairwise Stratified Association (PASTA) was performed to identify

those ASVs which exhibit consistent patterns in relative abundance between oral and pancreatic

tissue or intestinal site samples after adjusting for disease status (represented by ICD10 codes).

PASTA test was based on a Bayesian regression model to obtain Markov-Chain Monte Carlo

estimates of abundance among strata, to calculate a correlation statistic, and to conduct a formal

test based on its posterior distribution. For the present analyses, two disease-status stratifications

were considered: Grouping samples by four ICD10 codes “C24.*”, “C25.*”, “K86.2” and “other”

(encodes other gastrointestinal conditions), and grouping samples by three ICD10 codes “C24.*”,

“C25.*” and “other” (includes K86.2 and other gastrointestinal conditions). An ASV was defined

to exhibit a consistent PASTA pattern, if either one of three possible quantities was associated

between oral and pancreatic tissue or intestinal site samples: 1) The probability of absence (p),

which denotes the probability of observing a relative abundance of zero, 2) The non-zero mean

relative abundance (ω), which denotes mean relative abundance among those samples in which

ASV is present; and 3) The mean relative abundance across all samples (μ).

To conduct the PASTA test, p, ω, and μ were first estimated in each site-comparison-by-

disease-status subgroup based on a Bayesian Zero-inflated Beta regression model, which was fit

utilizing the OpenBUGS statistical analysis software and the R package “R2OpenBUGS” 43.

19 medRxiv preprint doi: https://doi.org/10.1101/2020.01.30.20019752; this version posted February 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.

This regression model accounted for within-subject correlation due to repeated measurements via

the inclusion of a subject-specific random intercept term. Using the subgroup estimates of p, ω,

and μ, the posterior probability of exhibiting no association (PN) between the two site-

comparison groups was calculated. This probability can be understood as a rejection threshold,

where a value of less than or equal to 0.05 was considered statistically significant (i.e., strong

evidence of association) and a value of less than 0.1 was considered as marginally significant

(i.e., moderate evidence of association). Further information about the data model, approach, and

performance of the PASTA test have been published in detail 44.

ASV co-abundance groups

ASVs shared by more than 10% of the buccal, saliva, duodenum, and pancreatic tumor

samples were considered prevalent ASVs. Correlations between these prevalent ASVs within

each sampling site were calculated using the SparCC method (Sparse Correlations for

Compositional data) 45. The statistical significance of these correlation coefficients was assessed

using a bootstrap procedure and then converted into a correlation distance matrix. Next, Ward

clustering algorithm, a top-down clustering approach, was used to cluster ASVs within each

sampling site into co-abundance groups. Starting from top of the Ward clustering tree,

Permutational MANOVA (9999 permutations, p<0.001) was used to sequentially test whether

any two branches of the tree were significantly different 46. ASVs within the same co-abundance

group increased or decreased in abundance together.

ASV co-abundance network

Bacterial co-abundance networks illustrate how groups of ASVs occupy different niches

of the microbial ecosystem. The co- abundance networks were visualized as force-directed

network plots using Python package NetworkX (version 2.2) with the spring layout of the

20 medRxiv preprint doi: https://doi.org/10.1101/2020.01.30.20019752; this version posted February 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.

Fruchterman-Reingold algorithm (k=0.15) 47 for buccal, saliva, duodenum, and pancreatic tumor

samples. Only ASVs with an absolute SparCC correlation value greater than 0.1 and a p-value

less than 0.05, were plotted. Force-directed algorithms simulate correlation coefficient values as

basic physical properties 48. Positive and negative correlations are modeled as gravity and

repulsion, respectively. ASVs with strong positive correlations between each other are plotted

near each other, while negatively correlated ASVs are pushed away from each other. In the co-

occurrence network plots, each node represented a unique bacterial ASV. Lines between nodes

represented correlations between the nodes they connect; with line width indicated the

correlation magnitude. Nodes were colored according to their final co-abundance group

assignment, calculated using Ward clustering algorithm described earlier. During graphing

simulation, the forces were applied to the nodes, pulling them together or pushing them further

apart. This iterative process continued until the system reached equilibrium state (lowest total

energy) and the relative position of nodes stopped changing from one iteration to the next.

References:

1 Dewhirst, F. E. et al. The human oral microbiome. Journal of bacteriology 192, 5002- 5017, doi:10.1128/jb.00542-10 (2010). 2 Slocum, C., Kramer, C. & Genco, C. A. Immune dysregulation mediated by the oral microbiome: potential link to chronic inflammation and atherosclerosis. J Intern Med 280, 114-128, doi:10.1111/joim.12476 (2016). 3 Leishman, S. J., Do, H. L. & Ford, P. J. Cardiovascular disease and the role of oral bacteria. J Oral Microbiol 2, doi:10.3402/jom.v2i0.5781 (2010). 4 Mealey, B. L., Oates, T. W. & American Academy of, P. Diabetes mellitus and periodontal diseases. J Periodontol 77, 1289-1303, doi:10.1902/jop.2006.050459 (2006). 5 Gomes-Filho, I. S., Passos, J. S. & Seixas da Cruz, S. Respiratory disease and the role of oral bacteria. J Oral Microbiol 2, doi:10.3402/jom.v2i0.5811 (2010). 6 Wang, L. & Ganly, I. The oral microbiome and oral cancer. Clin Lab Med 34, 711-719, doi:10.1016/j.cll.2014.08.004 (2014). 7 Karpinski, T. M. Role of Oral Microbiota in Cancer Development. Microorganisms 7, doi:10.3390/microorganisms7010020 (2019).

21 medRxiv preprint doi: https://doi.org/10.1101/2020.01.30.20019752; this version posted February 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.

8 Chen, J., Domingue, J. C. & Sears, C. L. Microbiota dysbiosis in select human cancers: Evidence of association and causality. Semin Immunol 32, 25-34, doi:10.1016/j.smim.2017.08.001 (2017). 9 Michaud, D. S. Role of bacterial infections in pancreatic cancer. Carcinogenesis 34, 2193-2197, doi:10.1093/carcin/bgt249 (2013). 10 Michaud, D. S., Fu, Z., Shi, J. & Chung, M. Periodontal Disease, Tooth Loss, and Cancer Risk. Epidemiol Rev 39, 49-58, doi:10.1093/epirev/mxx006 (2017). 11 Michaud, D. S. et al. Lifestyle, dietary factors, and antibody levels to oral bacteria in cancer-free participants of a European cohort study. Cancer Causes Control 24, 1901- 1909, doi:10.1007/s10552-013-0265-2 (2013). 12 Fan, X. et al. Human oral microbiome and prospective risk for pancreatic cancer: a population-based nested case-control study. Gut 67, 120-127, doi:10.1136/gutjnl-2016- 312580 (2018). 13 Farrell, J. J. et al. Variations of oral microbiota are associated with pancreatic diseases including pancreatic cancer. Gut 61, 582-588, doi:10.1136/gutjnl-2011-300784 (2012). 14 Olson, S. H. et al. The oral microbiota in patients with pancreatic cancer, patients with IPMNs, and controls: a pilot study. Cancer Causes Control 28, 959-969, doi:10.1007/s10552-017-0933-8 (2017). 15 Torres, P. J. et al. Characterization of the salivary microbiome in patients with pancreatic cancer. PeerJ 3, e1373, doi:10.7717/peerj.1373 (2015). 16 Del Castillo, E. et al. The Microbiomes of Pancreatic and Duodenum Tissue Overlap and Are Highly Subject Specific but Differ between Pancreatic Cancer and Noncancer Subjects. Cancer Epidemiol Biomarkers Prev 28, 370-383, doi:10.1158/1055-9965.EPI- 18-0542 (2019). 17 Zhang, C. et al. Dietary Modulation of Gut Microbiota Contributes to Alleviation of Both Genetic and Simple Obesity in Children. EBioMedicine 2, 968-984, doi:10.1016/j.ebiom.2015.07.007 (2015). 18 Palmer, R. J., Jr. Composition and development of oral bacterial communities. Periodontol 2000 64, 20-39, doi:10.1111/j.1600-0757.2012.00453.x (2014). 19 Segata, N. et al. Composition of the adult digestive tract bacterial microbiome based on seven mouth surfaces, tonsils, throat and stool samples. Genome Biol 13, R42, doi:10.1186/gb-2012-13-6-r42 (2012). 20 Yoshizawa, J. M. et al. Salivary biomarkers: toward future clinical and diagnostic utilities. Clin Microbiol Rev 26, 781-791, doi:10.1128/CMR.00021-13 (2013). 21 Lazarevic, V., Whiteson, K., Hernandez, D., Francois, P. & Schrenzel, J. Study of inter- and intra-individual variations in the salivary microbiota. BMC Genomics 11, 523, doi:10.1186/1471-2164-11-523 (2010). 22 Costello, E. K. et al. Bacterial community variation in human body habitats across space and time. Science 326, 1694-1697, doi:10.1126/science.1177486 (2009). 23 Yu, J. et al. Metagenomic analysis of faecal microbiome as a tool towards targeted non- invasive biomarkers for colorectal cancer. Gut 66, 70-78, doi:10.1136/gutjnl-2015- 309800 (2017). 24 Flemer, B. et al. The oral microbiota in colorectal cancer is distinctive and predictive. Gut 67, 1454-1463, doi:10.1136/gutjnl-2017-314814 (2018).

22 medRxiv preprint doi: https://doi.org/10.1101/2020.01.30.20019752; this version posted February 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.

25 Komiya, Y. et al. Patients with colorectal cancer have identical strains of Fusobacterium nucleatum in their colorectal cancer and oral cavity. Gut, doi:10.1136/gutjnl-2018- 316661 (2018). 26 Chung, P. Y. The emerging problems of Klebsiella pneumoniae infections: carbapenem resistance and biofilm formation. FEMS Microbiol Lett 363, doi:10.1093/femsle/fnw219 (2016). 27 Maiden, M. F., Tanner, A. & Macuch, P. J. Rapid characterization of periodontal bacterial isolates by using fluorogenic substrate tests. J Clin Microbiol 34, 376-384 (1996). 28 Kwong, T. N. Y. et al. Association Between Bacteremia From Specific Microbes and Subsequent Diagnosis of Colorectal Cancer. Gastroenterology 155, 383-390 e388, doi:10.1053/j.gastro.2018.04.028 (2018). 29 Mitsuhashi, K. et al. Association of Fusobacterium species in pancreatic cancer tissues with molecular features and prognosis. Oncotarget 6, 7209-7220, doi:10.18632/oncotarget.3109 (2015). 30 Callahan, B. J., McMurdie, P. J. & Holmes, S. P. Exact sequence variants should replace operational taxonomic units in marker-gene data analysis. ISME J 11, 2639-2643, doi:10.1038/ismej.2017.119 (2017). 31 Zhou, Y. et al. Biogeography of the ecosystems of the healthy human body. Genome Biol 14, R1, doi:10.1186/gb-2013-14-1-r1 (2013). 32 Claesson, M. J. et al. Gut microbiota composition correlates with diet and health in the elderly. Nature 488, 178-184, doi:10.1038/nature11319 (2012). 33 Callahan, B. J. et al. DADA2: High-resolution sample inference from Illumina amplicon data. Nat Methods 13, 581-583, doi:10.1038/nmeth.3869 (2016). 34 Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nature methods 9, 357 (2012). 35 Fadrosh, D. W. et al. An improved dual-indexing approach for multiplexed 16S rRNA gene sequencing on the Illumina MiSeq platform. Microbiome 2, 6, doi:10.1186/2049- 2618-2-6 (2014). 36 Bolyen, E. et al. QIIME 2: Reproducible, interactive, scalable, and extensible microbiome data science. Report No. 2167-9843, (PeerJ Preprints, 2018). 37 R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, Vienna, Austria, 2016). 38 Lozupone, C. & Knight, R. UniFrac: a new phylogenetic method for comparing microbial communities. Appl Environ Microbiol 71, 8228-8235, doi:10.1128/AEM.71.12.8228- 8235.2005 (2005). 39 Lozupone, C. A., Hamady, M., Kelley, S. T. & Knight, R. Quantitative and qualitative beta diversity measures lead to different insights into factors that structure microbial communities. Appl Environ Microbiol 73, 1576-1585, doi:10.1128/AEM.01996-06 (2007). 40 Bokulich, N. A. et al. q2-longitudinal: Longitudinal and Paired-Sample Analyses of Microbiome Data. mSystems 3, e00219-00218 (2018). 41 Anderson, M. J. A new method for nonparametric multivariate analysis of variance. Austral ecology 26, 32-46 (2001). 42 Fleiss, J. L., Cohen, J. & Everitt, B. S. Large sample standard errors of kappa and weighted kappa. Psychological Bulletin 72, 323-327, doi:10.1037/h0028106 (1969).

23 medRxiv preprint doi: https://doi.org/10.1101/2020.01.30.20019752; this version posted February 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.

43 Sturtz, S., Ligges, U. & Gelman, A. E. R2WinBUGS: a package for running WinBUGS from R. (2005). 44 Meier, R. et al. A Bayesian framework for identifying consistent patterns of microbial abundance between body sites. Stat Appl Genet Mol Biol, doi:10.1515/sagmb-2019-0027 (2019). 45 Friedman, J. & Alm, E. J. Inferring correlation networks from genomic survey data. PLoS computational biology 8, e1002687 (2012). 46 Zhang, C. et al. Dietary modulation of gut microbiota contributes to alleviation of both genetic and simple obesity in children. EBioMedicine 2, 968-984 (2015). 47 Fruchterman, T. M. & Reingold, E. M. Graph drawing by forcedirected placement. Software: Practice and experience 21, 1129-1164 (1991). 48 Fernandez, M., Riveros, J. D., Campos, M., Mathee, K. & Narasimhan, G. Microbial" social networks". BMC genomics 16, S6 (2015).

24 medRxiv preprint doi: https://doi.org/10.1101/2020.01.30.20019752; this version posted February 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.

Acknowledgements

We thank the participants who graciously enrolled in this study. We thank Ms. Priyanka Joshi for

her tremendous help with recruitment of subjects at the RIH, and Dr. Ross Taliano for assisting

with the preparation of the tissue specimens in the Pathology Department at the RIH.

Authors' contributions

J.I. and D.S.M. designed this study and obtained funding. N.Z., R.M., and D.C.K.

developed the methodology used in the analyses. Sample collection and processing were

performed by E.C., B.J.P, K.T.K, J.I, D.S.M., and K.C. Analyses and interpretation of data (e.g.,

statistical analysis, biostatistics, computational analysis were performed by M.C, N.Z., R.M.,

D.C.K. G.W., and D.S.M. All authors contributed to draft the manuscript and revised it critically.

Additioanl Information

Ethics approval and consent to participate

The study was approved by Lifespan’s Research Protection Office for recruitment at RIH,

as well as the Institutional Review Boards for Human Subjects Research at Brown University,

Tufts University and the Forsyth Institute. An informed consent was obtained from all subjects.

All methods carried out were in in accordance with Helsinki Declaration as revised in 2013.

Consent for publication

Not applicable.

Availability of data and material

Sequence data have been deposited in Sequence Read Archive (SRA), accessible with the

following link after January 1, 2020:

25 medRxiv preprint doi: https://doi.org/10.1101/2020.01.30.20019752; this version posted February 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.

https://www.ncbi.nlm.nih.gov/sra/PRJNA558364

Competing interests

The authors declare no competing interests.

Funding

The research reported in this publication was supported by the NIH/NCI grants R01 CA166150

and P30 CA168524.

26 medRxiv preprint (which wasnotcertifiedbypeerreview)istheauthor/funder,whohasgrantedmedRxivalicensetodisplaypreprintinperpetuity. Table 1. Characteristics of RIH subjects who had at least one oral sample and at least one intestinal or pancreatic sample

RIH Subjects (N=52) doi: Characteristic Mean (SD) https://doi.org/10.1101/2020.01.30.20019752 Age (years) 64.1 (12.5) Body mass index (kg/m2) 26.3 (4.9), n=51 n (%) Sex

Male 24 (46) All rightsreserved.Noreuseallowedwithoutpermission. Female 28 (54) Race

Caucasian 49 (94) Black 1 (2) Asian 2 (4)

Smoking status ; this versionpostedFebruary3,2020. Ever smoker 32 (61.5) Nonsmoker 19 (36.5) Missing 1 (2) Chemotherapy

Never 37 (71.2) Prior to past 6 months 5 (9.6) In past 6 months 6 (11.5) Missing 4 (7.7) SD = standard deviation The copyrightholderforthispreprint

27 medRxiv preprint (which wasnotcertifiedbypeerreview)istheauthor/funder,whohasgrantedmedRxivalicensetodisplaypreprintinperpetuity. Table 2. Seven ASVs that showed significant agreement with regards to the probabilities of presence or absence (Kappa statistics) between any oral site and any pancreatic tissue or intestinal samples. doi:

ASV IDb Kappa (SD) P adj fdr Agree obs Prop.11 Prop.10 Prop.01 Prop.00 Genusa Species (HOMD)a https://doi.org/10.1101/2020.01.30.20019752 ASV27 0.322 (0.101) 0.013 0.868 0.038 0.132 0.000 0.830 Oribacterium parvum ASV28 0.322 (0.101) 0.013 0.868 0.038 0.132 0.000 0.830 Fusobacterium nucleatum subsp. vincentii ASV23 0.399 (0.124) 0.013 0.868 0.057 0.113 0.019 0.811 Rothia mucilaginosa (GG) All rightsreserved.Noreuseallowedwithoutpermission. ASV13 0.356 ( 0.131) 0.036 0.792 0.094 0.151 0.057 0.698 Gemella morbillorum ASV06 0.321 (0.117) 0.036 0.679 0.17 0.283 0.038 0.509 Rothia aeria / mucilaginosa GG)

ASV70 0.224 (0.087) 0.043 0.887 0.019 0.000 0.113 0.868 Streptococcus parasanguinis clade ; this versionpostedFebruary3,2020. 411 / anginosus (GG) ASV25 0.215 (0.085) 0.043 0.792 0.038 0.208 0.000 0.755 Prevotella veroralis / melaninogenica (GG) Agree obs = observed agreement; ASV = Amplicon Sequence Variants; SD = standard deviation; P adj fdr = p-value adjusted for multiple testing via the false discovery rate method; Prop.11 = probability of an ASV to be present in both oral and any pancreatic tissue or intestinal samples; Prop.10 = probability of an ASV to be present in oral samples but absent in pancreatic or intestinal samples; Prop.01 = probability of an ASV to be absent in oral samples but present in pancreatic or intestinal samples; Prop.00 = probability of an ASV to be absent in both oral and any pancreatic tissue or intestinal samples. a Taxonomic annotations of the ASVs are from HOMD database unless otherwise noted. GG = geen gene database. The copyrightholderforthispreprint b ASV ID used in the Figure 1 left panel.

28 medRxiv preprint (which wasnotcertifiedbypeerreview)istheauthor/funder,whohasgrantedmedRxivalicensetodisplaypreprintinperpetuity. Table 3. Results of the PASTA test when coding disease status into three groups (“C24.*”, “C25.*”, “other”). Five ASVs showed consistent patterns between oral samples and pancreatic or intestinal samples with regards to the probability of absence (p), non-zero doi: mean relative abundance (ω), or mean relative abundance (μ). The estimated posterior probability of there being no association (PN) https://doi.org/10.1101/2020.01.30.20019752 between oral samples and pancreatic or intestinal samples is provided in two forms: TP evaluates existence of a linear relationship and

TS evaluates the existence of a directional relationship (i.e. consistency of rankings). PN < 0.1 represents moderate evidence of a All rightsreserved.Noreuseallowedwithoutpermission. consistent pattern and PN < 0.05 represents strong evidence of a consistent pattern.

ASV IDb PN PN PN PN PN PN Genusa Species (HOMD)a μ μ ω ω p p TP TS TP TS TP TS ; this versionpostedFebruary3,2020. ASV28 NA NA NA NA 0.0274 0.025 Fusobacterium nucleatum subsp. vincentii ASV21 NA NA NA NA 0.03 0.03 Streptococcus ASV13 0.0444 0.045 0.4774 0.4744 0.0432 0.0462 Gemella morbillorum ASV19 NA NA NA NA 0.11 0.0976 Streptococcus parasanguinis clade 411 ASV67 NA NA NA NA 0.0952 0.102 Veillonella parvula NA = not applicable; PN posterior probability of there being no association (PN<0.1 is moderate evicence, PN<0.05 is strong evidence); TP evaluates existence of a linear relationship; TS evaluates the existence of a directional relationship (i.e. consistency of

rankings) The copyrightholderforthispreprint a Taxonomic annotations of the ASVs are from HOMD database unless otherwise noted. b ASV ID used in the Figure 1 left panel.

29 medRxiv preprint doi: https://doi.org/10.1101/2020.01.30.20019752; this version posted February 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.

Figure Titles and Legends

Figure 1. Shared ASVs (number of subjects) and Relative Abundance by Sampling Sites.

Legends: On the left panel of the heatmap, each ASV is labeled with a ASV ID and the number

of subjects who have the ASV present in both their oral and intestinal & pancreatic samples.

Taxonomic annotations of these shared ASV are also provided on the right panel of the heatmap.

Figure 2. ASVs that exhibited associations between oral samples and pancreatic tissue or

intestinal samples after adjusting for disease types and within-subject correlation structure.

Disease types were grouped into 4 groups in panel a, and were grouped into 3 groups in panel b.

Taxonomic annotations of the ASVs are provided on top of each result plot. Legends: panc or

intestinal = pancreatic and intestinal samples; PN = posterior probability of there being no

association (PN<0.1 is moderate evicence, PN<0.05 is strong evidence); TP evaluates existence

of a linear relationship; TS evaluates the existence of a directional relationship (i.e. consistency

of rankings). Plots depict parameter estimates (blue diman or red box) and their 95% credible

intervals. For results of the site-specific mean relative abundance (mu), the observed data (blue

or red stars) are also plotted next to the intervals.

Figure 3. ASVs co-abundance network diagram (panel a) and Ward clusters (panel b) for saliva

samples. Legends: Nodes in co-abundance network diagram were colored according to their final

co-abundance group assignment described in Ward clusters.

30 medRxiv preprint doi: https://doi.org/10.1101/2020.01.30.20019752; this version posted February 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.

Figure 4. ASVs co-abundance network diagram (panel a) and Ward clusters (panel b) for buccal

samples. Legends: Nodes in co-abundance network diagram were colored according to their final

co-abundance group assignment described in Ward clusters.

Figure 5. ASVs co-abundance network diagram (panel a) and Ward clusters (panel b) for

duodenum samples. Legends: Nodes in co-abundance network diagram were colored according

to their final co-abundance group assignment described in Ward clusters.

Figure 6. ASVs co-abundance network diagram (panel a) and Ward clusters (panel b) for

jejunum swab samples. Legends: Nodes in co-abundance network diagram were colored

according to their final co-abundance group assignment described in Ward clusters.

Figure 7. ASVs co-abundance network diagram (panel a) and Ward clusters (panel b) for

pancreatic tumor samples. Legends: Nodes in co-abundance network diagram were colored

according to their final co-abundance group assignment described in Ward clusters.

31 medRxiv preprint doi: https://doi.org/10.1101/2020.01.30.20019752; this version posted February 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission. medRxiv preprint doi: https://doi.org/10.1101/2020.01.30.20019752; this version posted February 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission. medRxiv preprint doi: https://doi.org/10.1101/2020.01.30.20019752; this version posted February 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.

3a. Saliva co-abundance network Height 3b.

0.0 0.5 1.0 1.5 2.0 medRxiv preprint (which wasnotcertifiedbypeerreview)istheauthor/funder,whohasgrantedmedRxivalicensetodisplaypreprintinperpetuity.

Stomatobaculum|longum Veillonella|parvula*13 Veillonella|parvula*8 doi: Prevotella|veroralis*14 Atopobium| https://doi.org/10.1101/2020.01.30.20019752 Prevotella|*3 Prevotella|veroralis*11 Veillonella|parvula*10 Prevotella|veroralis*9 Veillonella|parvula*11 Fusobacterium|*3 Haemophilus|parainfluenzae*3 Haemophilus|parainfluenzae*4 Neisseria|*1 Oribacterium|asaccharolyticum*1

Lachnoanaerobaculum|orale All rightsreserved.Noreuseallowedwithoutpermission. ASV06: Rothia|aeria*1 Prevotella|*4 ASV25: Prevotella|veroralis*6 Leptotrichia|wadei*3 Stomatobaculum|sp._HMT_097 Peptostreptococcaceae_[XI][G-1]|sulci Prevotella|pallens*2 Saccharibacteria_(TM7)_[G-1]|bacterium_HMT_352 Ruminococcaceae_[G-2]|bacterium_HMT_085 Prevotella|nanceiensis*2 Solobacterium|moorei*2 Prevotella|veroralis*12

Prevotella|veroralis*4 ;

Prevotella|pallens*3 this versionpostedFebruary3,2020. Veillonella|parvula*18 Megasphaera|micronuciformis*1 Fusobacterium|nucleatum_subsp._vincentii*6 Prevotella|*1 Fusobacterium|nucleatum_subsp._vincentii*10 Fusobacterium|nucleatum_subsp._vincentii*2 Fusobacterium|nucleatum_subsp._vincentii*8 Selenomonas| Fusobacterium|nucleatum_subsp._vincentii*18 Haemophilus|parainfluenzae*8 Haemophilus|parainfluenzae*6 Porphyromonas|pasteri*2 Gemella|morbillorum*5 ASV21: Streptococcus|*4 ASV19: Streptococcus|parasanguinis_clade_411*6 Rothia|aeria*4 Rothia|aeria*6 Granulicatella|adiacens Haemophilus|parainfluenzae*7 Alloprevotella|sp._HMT_308 Haemophilus|influenzae*1 The copyrightholderforthispreprint Neisseria|*2 Prevotella|nigrescens*1 Haemophilus|influenzae*2 Porphyromonas|pasteri*3 Granulicatella|elegans Gemella|morbillorum*6 Streptococcus|*3 Prevotella|pallens*4 Rothia|aeria*3 Ruminococcaceae_[G-1]|bacterium_HMT_075 ASV27: Oribacterium|parvum*1 Veillonella|parvula*16 Neisseria|bacilliformis Veillonella|parvula*14 Streptococcus|*2 ASV23: Rothia| Streptococcus|parasanguinis_clade_411*3 Porphyromonas|pasteri*4 Prevotella|nanceiensis*1 Actinomyces|*5 Prevotella|nanceiensis*3 Prevotella|veroralis*16 Prevotella|veroralis*5 Veillonella|parvula*19 Veillonella|sp._HMT_917 Prevotella|veroralis*13 Prevotella|veroralis*15 Streptococcus|*6 Streptococcus|salivarius*1 Saliva Cluster Dendrogram Rothia|aeria*5 Rothia|aeria*7 Prevotella|denticola*1 Prevotella|nigrescens*2 Streptococcus|parasanguinis_clade_411*5 Fusobacterium|*5 Leptotrichia|sp._HMT_215*1 Butyrivibrio|sp._HMT_455 Campylobacter|curvus Catonella|morbi ASV13: Gemella|morbillorum*2 Leptotrichia|wadei*1 Fusobacterium|nucleatum_subsp._vincentii*12 Actinomyces|sp._HMT_180*4 Leptotrichia|wadei*2 Veillonella|parvula*17 Veillonella|parvula*6 Campylobacter|*2 Megasphaera|micronuciformis*4 Leptotrichia|wadei*4 Veillonella|parvula*5 Capnocytophaga|sputigena Fusobacterium|*1 Lautropia|mirabilis Pasteurellaceae(F)|*1 Kingella|sp._HMT_012*2 Abiotrophia|defectiva Leptotrichia|*1 Actinomyces|sp._HMT_180*7 Cardiobacterium|hominis Kingella|sp._HMT_012*1 Leptotrichia|sp._HMT_215*2 Streptococcus|mutans Prevotella|veroralis*7 Streptococcus|parasanguinis_clade_411*4 Prevotella|veroralis*8 Megasphaera|micronuciformis*3 Prevotella|veroralis*10 Prevotella|veroralis*3 Rothia|aeria*2 Streptococcus|*1 Fusobacterium|nucleatum_subsp._vincentii*17 Bergeyella|sp._HMT_322*1 Fusobacterium|*2 Haemophilus|parainfluenzae*1 Oribacterium|parvum*2 Prevotella|oris*1 Lachnoanaerobaculum| Porphyromonas|pasteri*1 Actinomyces|sp._HMT_180*2 Streptococcus|parasanguinis_clade_411*2 Haemophilus|parainfluenzae*2 Prevotella|pallens*1 Megasphaera|micronuciformis*5 Megasphaera|micronuciformis*2 Oribacterium|asaccharolyticum*2 Prevotella|denticola*2 Streptococcus|parasanguinis_clade_411*7 Prevotella|*5 Haemophilus|parainfluenzae*5 Peptostreptococcus|stomatis Alloprevotella|tannerae*1 Aggregatibacter|aphrophilus Fusobacterium|nucleatum_subsp._vincentii*1 Bergeyella|sp._HMT_322*2 Veillonella|parvula*21 ASV67: Veillonella|parvula*15 Tannerella|forsythia Streptococcus|salivarius*2 Streptococcus|*7 Prevotella|veroralis*2 Prevotella|oris*3 Prevotella|oris*2 Mogibacterium| Fusobacterium|nucleatum_subsp._vincentii*5 Fusobacterium|nucleatum_subsp._vincentii*3 Fusobacterium|nucleatum_subsp._vincentii*16 Bacteroidales_[G-2]|bacterium_HMT_274*2 Alloprevotella|tannerae*4 Actinomyces|*4 Actinomyces|sp._HMT_169*1 ASV28: Fusobacterium|nucleatum_subsp._vincentii*9 Porphyromonas|endodontalis*1 Mycoplasma|faucium Prevotella|*2 Campylobacter|rectus Fusobacterium|nucleatum_subsp._vincentii*4 Capnocytophaga|gingivalis*1 Campylobacter|*1 Oribacterium|sp._HMT_078 Alloprevotella|tannerae*2 Dialister|invisus Fusobacterium|*4 Fusobacterium|sp._HMT_203 Veillonella|parvula*12 Streptococcus|parasanguinis_clade_411*8 Veillonella|parvula*2 Streptococcus|*5 Parvimonas|micra Porphyromonas|endodontalis*2 medRxiv preprint doi: https://doi.org/10.1101/2020.01.30.20019752; this version posted February 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.

4a. Buccal co-abundance network Height 4b. 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 medRxiv preprint (which wasnotcertifiedbypeerreview)istheauthor/funder,whohasgrantedmedRxivalicensetodisplaypreprintinperpetuity.

Lactobacillus|salivarius Atopobium| doi: Streptococcus|salivarius

Veillonella|parvula*8 https://doi.org/10.1101/2020.01.30.20019752 Prevotella|veroralis*11 Prevotella|veroralis*9 Prevotella|*3 Veillonella|parvula*11 Streptococcus|parasanguinis_clade_411*7 Stomatobaculum|longum Veillonella|parvula*13 Campylobacter|curvus Prevotella|*4 Veillonella|parvula*10 Streptococcus|parasanguinis_clade_411*8

Veillonella|parvula*2 All rightsreserved.Noreuseallowedwithoutpermission. Mogibacterium|timidum Veillonella|parvula*9 Atopobium|parvulum Veillonella|parvula*12 Prevotella|oralis Campylobacter|*1 Scardovia|wiggsiae Megasphaera|micronuciformis*2 Prevotella|veroralis*7 Veillonella|parvula*4 Streptococcus|mutans Veillonella|parvula*6 ASV67: Veillonella|parvula*15 ;

Veillonella|parvula*7 this versionpostedFebruary3,2020. Prevotella|pallens*2 Alloprevotella|sp._HMT_308 Solobacterium|moorei*2 Fusobacterium|*3 Haemophilus|parainfluenzae*3 Haemophilus|parainfluenzae*4 Neisseria| Eikenella|corrodens Fusobacterium|sp._HMT_203 Fusobacterium|nucleatum_subsp._vincentii*4 Haemophilus|parainfluenzae*2 Actinomyces|sp._HMT_180*3 Fusobacterium|*1 Haemophilus|parainfluenzae*1 Capnocytophaga|gingivalis*1 Bergeyella|sp._HMT_322*2 Prevotella|veroralis*3 Capnocytophaga|sputigena Haemophilus|parainfluenzae*5 Leptotrichia|*2 Abiotrophia|defectiva Rothia|aeria*3 The copyrightholderforthispreprint Haemophilus|parainfluenzae*7 ASV19: Streptococcus|parasanguinis_clade_411*6 Porphyromonas|pasteri*1 Neisseria|bacilliformis Streptococcus|*5 Actinomyces|sp._HMT_180*2 ASV06: Rothia|aeria*1 ASV27: Oribacterium|parvum Selenomonas| Lachnospiraceae_[G-2]|bacterium_HMT_096 ASV25: Prevotella|veroralis*6 Actinomyces|graevenitzii*2 Stomatobaculum|sp._HMT_097 Leptotrichia|wadei*2 Kingella|sp._HMT_012 Leptotrichia|*1 ASV28: Fusobacterium|nucleatum_subsp._vincentii*9 Rothia|aeria*2 Prevotella|nigrescens*2 Veillonella|parvula*5 Gemella|morbillorum*6 Streptococcus|*3 Granulicatella|elegans Haemophilus|influenzae*1 Actinomyces|sp._HMT_180*6 Corynebacterium|matruchotii Leptotrichia|wadei*1 Streptococcus|parasanguinis_clade_411*5 ASV23: Rothia| Streptococcus|*2 Bergeyella|sp._HMT_322*1 ASV13: Gemella|morbillorum*2 Neisseria|oralis Gemella|morbillorum*5 Streptococcus|parasanguinis_clade_411*3 Gemella|morbillorum*4 Lautropia|mirabilis Pasteurellaceae(F)|*1

Aggregatibacter|aphrophilus Buccal Cluster Dendrogram Pasteurellaceae(F)|*2 Rothia|aeria*4 Streptococcus|parasanguinis_clade_411*4 Actinomyces|sp._HMT_180*4 Prevotella|veroralis*2 Prevotella|denticola*1 Prevotella|oris Fusobacterium|nucleatum_subsp._vincentii*8 Alloprevotella|tannerae*2 Streptococcus|parasanguinis_clade_411*1 Prevotella|dentalis Streptococcus|parasanguinis_clade_411*10 Treponema|socranskii Bacteroidales_[G-2]|bacterium_HMT_274*2 Streptococcus|*7 Bacteroidales_[G-2]|bacterium_HMT_274*1 Fusobacterium|nucleatum_subsp._vincentii*6 Alloprevotella|tannerae*3 Leptotrichia|sp._HMT_215*1 Streptococcus|parasanguinis_clade_411*2 Actinomyces|*2 Actinomyces|sp._HMT_169*2 Porphyromonas|pasteri*3 ASV21: Streptococcus|*4 Porphyromonas|pasteri*4 Prevotella|nanceiensis*2 Actinomyces|sp._HMT_169*1 Dialister|invisus Fusobacterium|nucleatum_subsp._vincentii*11 Porphyromonas|endodontalis Prevotella|veroralis*1 Megasphaera|micronuciformis*3 Veillonella|parvula*1 Actinomyces|sp._HMT_180*1 Filifactor|alocis Solobacterium|moorei*1 Leptotrichia|wadei*4 Ruminococcaceae_[G-1]|bacterium_HMT_075 Fusobacterium|*2 Prevotella|pallens*1 Megasphaera|micronuciformis*1 Oribacterium|asaccharolyticum Campylobacter|rectus Streptococcus|*1 Prevotella|veroralis*10 Prevotella|veroralis*5 Veillonella|sp._HMT_780 Veillonella|parvula*20 Veillonella|parvula*19 Veillonella|parvula*18 Veillonella|parvula*17 Treponema| Tannerella|forsythia ASV70: Streptococcus|parasanguinis_clade_411*9 Streptococcus|*6 Staphylococcus|capitis Shuttleworthia|satelles Selenomonas|sputigena Saccharibacteria_(TM7)_[G-1]|bacterium_HMT_346 Rothia|aeria*6 Prevotella|veroralis*8 Prevotella|veroralis*13 Prevotella|nigrescens*1 Prevotella|nanceiensis*1 Prevotella|denticola*2 Prevotella|*1 Parvimonas|micra Mycoplasma|faucium Leptotrichia|wadei*3 Haemophilus|parainfluenzae*9 Gemella|morbillorum*1 Fusobacterium|nucleatum_subsp._vincentii*7 Fusobacterium|nucleatum_subsp._vincentii*5 Fusobacterium|nucleatum_subsp._vincentii*3 Fusobacterium|nucleatum_subsp._vincentii*2 Fusobacterium|nucleatum_subsp._vincentii*12 Fusobacterium|nucleatum_subsp._vincentii*10 Fusobacterium|nucleatum_subsp._vincentii*1 Fusobacterium|*4 Fretibacterium|fastidiosum Corynebacterium|durum*1 Campylobacter|*2 Atopobium|rimae Alloscardovia|omnicolens Alloprevotella|tannerae*1 Actinomyces|sp._HMT_180*7 Actinomyces|sp._HMT_180*5 Actinomyces|*1 Actinomyces|graevenitzii*1 Gemella|morbillorum*3 Actinomyces|sp._HMT_169*3 Oribacterium|sp._HMT_078 Prevotella|*2 Leptotrichia|sp._HMT_215*2 Corynebacterium|durum*2 Actinomyces|*3 Haemophilus|parainfluenzae*8 Veillonella|parvula*16 Rothia|aeria*5 Veillonella|parvula*14 Haemophilus|influenzae*2 Prevotella|pallens*3 Capnocytophaga|gingivalis*2 Haemophilus|parainfluenzae*6 Porphyromonas|pasteri*2 Prevotella|veroralis*12 Prevotella|veroralis*4 Capnocytophaga|granulosa Veillonella|parvula*3 Lachnoanaerobaculum| Saccharibacteria_(TM7)_[G-1]|bacterium_HMT_352 Granulicatella|adiacens Lachnoanaerobaculum|orale medRxiv preprint doi: https://doi.org/10.1101/2020.01.30.20019752; this version posted February 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.

5a. Duodenum co-abundance network medRxiv preprint doi: https://doi.org/10.1101/2020.01.30.20019752; this version posted February 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.

5b. Cluster Dendrogram Duodenum Height 0.0 1.5 Olsenella|uli Leptotr ichia|*1 Anaerococcus| Bifidobacter ium| Enterococcus|*1 Rothia|aer ia*2 Bradyrhiz obium | Enterococcus|*2 Enterococcus|*4 Dialister|in visus Fusobacter ium|*2 Enterococcus|*3 Escher ichia|coli*1 Finegoldia|magna Acinetobacter|lwoffii Escher ichia|coli*2 Eikenella|corrodens Parvimonas|micr a Streptococcus|*3 Kluyvera|ascorbata*1 Veillonella|par vula*15 Veillonella|par vula*6 Veillonella|par vula*8 Corynebacter ium(G)| Clostr idium|perfr ingens Anaerococcus|octa vius Kluyvera|ascorbata*2 Prevotella|nigrescens*2 Veillonella|par vula*2 Streptococcus|mutans Gemella|morbillorum*2 Staphylococcus|capitis Klebsiella|pneumoniae*2 Streptococcus|saliv arius Campylobacter|rectus Solobacter ium|moorei*3 Bifidobacter ium|animalis Enterobacter iaceae(F)|*2 Ralstonia|sp ._HMT_406 Dialister|pneumosintes Granulicatella|adiacens Klebsiella|pneumoniae*6 Klebsiella|pneumoniae*8 Gemella|morbillor um*6 Klebsiella|pneumoniae*1 Klebsiella|pneumoniae*5 Alloprevotella|tannerae*1 Alloprevotella|tannerae*3 Fretibacter ium|fastidiosum Klebsiella|pneumoniae*4 Klebsiella|pneumoniae*3 Desulfovibrio|fairfieldensis*1 Desulfovibrio|fairfieldensis*3 Enterobacter iaceae(F)|*1 Lawsonella|cle velandensis*1 Brevundimonas|diminuta Klebsiella|pneumoniae*11 Klebsiella|pneumoniae*9 Lawsonella|cle velandensis*2 Klebsiella|pneumoniae*7 Klebsiella|pneumoniae*12 Desulfovibrio|fairfieldensis*2 Lachnospiraceae_[XIV](F)| Klebsiella|pneumoniae*13 Haemophilus|parainfluenzae*3 Klebsiella|pneumoniae*10 Bacteroides|hepar inolyticus*4 Sphingomonas|y abuuchiae Bacteroides|hepar inolyticus*3 Bacteroides|hepar inolyticus*2 Bacteroides|hepar inolyticus*5 Peptostreptococcus|stomatis Desulfovibrio|fairfieldensis*4 Bacteroides|hepar inolyticus*1 Stomatobaculum|sp ._HMT_097 Megasphaera|micronucif ormis*4 ASV13: ASV67: Streptococcus|parasanguinis_clade_411*8 Neisser iaceae_[G−1]|bacter ium_HMT_327 Streptococcus|parasanguinis_clade_411*13 Streptococcus|parasanguinis_clade_411*5 Streptococcus|parasanguinis_clade_411*14 Streptococcus|parasanguinis_clade_411*11 Streptococcus|parasanguinis_clade_411*12 Streptococcus|parasanguinis_clade_411*7 Streptococcus|parasanguinis_clade_411*9 Streptococcus|parasanguinis_clade_411*10 Fusobacter ium|nucleatum_subsp ._vincentii*1 Ruminococcaceae_[G−2]|bacter ium_HMT_085 Peptoniphilaceae_[G−3]|bacter ium_HMT_929 Fusobacter ium|nucleatum_subsp ._vincentii*13 Fusobacter ium|nucleatum_subsp ._vincentii*2 Fusobacter ium|nucleatum_subsp ._vincentii*15 Fusobacter ium|nucleatum_subsp ._vincentii*14 Sacchar ibacter ia_(TM7)_[G−1]|bacter ium_HMT_352 medRxiv preprint doi: https://doi.org/10.1101/2020.01.30.20019752; this version posted February 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.

6a. Jejunums co-abundance network Height 6b. 0.0 0.5 1.0 1.5 2.0 medRxiv preprint

Klebsiella|pneumoniae*11

Klebsiella|pneumoniae*9 (which wasnotcertifiedbypeerreview)istheauthor/funder,whohasgrantedmedRxivalicensetodisplaypreprintinperpetuity. Klebsiella|pneumoniae*10 Klebsiella|pneumoniae*7 Klebsiella|pneumoniae*15 Klebsiella|pneumoniae*4 Klebsiella|pneumoniae*16 Klebsiella|pneumoniae*2 Granulicatella|adiacens Streptococcus|parasanguinis_clade_411*7 Rothia|aer ia*2 Gemella|morbillorum*6 Streptococcus|*3 Veillonella|parvula*13 Veillonella|parvula*8 Stomatobaculum|longum doi: Veillonella|parvula*10 Megasphaera|micronucif or mis*3 Ruminococcaceae_[G−2]|bacter ium_HMT_085

Finegoldia|magna https://doi.org/10.1101/2020.01.30.20019752 Acinetobacter|junii Acinetobacter|sp._HMT_408 Bergey ella|sp ._HMT_322*3 Acidovo rax|ebreus Alloprevotella|rav a*3 Streptococcus|parasanguinis_clade_411*10 Veillonella|parvula*5 Prevotella|nanceiensis*2 Fusobacter ium|nucleatum_subsp._vincentii*18 Streptococcus|*7 Fusobacter ium|nucleatum_subsp._vincentii*8 Haemophilus|parainfluenzae*3 Neisser ia|*1 Haemophilus|influenzae*2 Klebsiella|pneumoniae*14 Or ibacter ium|asaccharolyticum*2 Haemophilus|parainfluenzae*7 Megasphaera|micronucifor mis*2 Fusobacter ium|* 3 ASV13: Gemella|morbillorum*2 Pa r vimonas|micra*2

Prev otella|veroralis*9 All rightsreserved.Noreuseallowedwithoutpermission. Treponema|socranskii Mycoplasma|salivar ium ASV67: Veillonella|parvula*15 Dialister|pneumosintes Leptotr ichia|sp._HMT_215*2 Pre v otella|*3 Prev otella|veroralis*4 Lactococcus|lactis Streptococcus|mutans Gemella|morbillorum*7 Scardovia|wiggsiae Streptococcus|parasanguinis_clade_411*8 Streptococcus|*8 Atopobium|par vulum Sacchar ibacteria_(TM7)_[G−1]|bacterium_HMT_349 Staph ylococcus|capitis*5 Allopre v otella|ra v a*5 Enterobacteriaceae(F)|*1 Fusobacter ium|*1 Haemophilus|parainfluenzae*1 V eillonella|par vula*2 Prev otella|nanceiensis*1 Campylobacter|cur vus Streptococcus|*2 Bergeyella|sp._HMT_322*1 Leptotr ichia|*1 ; Leptotrichia|sp._HMT_221 this versionpostedFebruary3,2020. Neisser ia|oralis Leptotrichia|*3 Leptotr ichia|w adei*1 Jejunumswab Cluster Dendrogram Fusobacter ium|*5 Lachnoanaerobaculum|*1 Or ibacter ium|asaccharolyticum*1 Veillonella|parvula*11 Atopobium | Streptococcus|saliv ar ius*1 Prevotella|pallens*3 Rothia|aer ia*4 Solobacterium|moorei*3 Actinomyces|lingnae*3 Mogibacterium | Streptococcus|parasanguinis_clade_411*15 Pa r vimonas|micra*1 Veillonella|parvula*18 Bifidobacter ium|longum Peptostreptococcaceae_[XI][G−9]|brachy Po rp hyromonas|endodontalis*1 Megasphaera|micronucifor mis*1 Actinomyces|sp._HMT_180*7 Fusobacter ium|nucleatum_subsp._vincentii*3 Alloscardovia|omnicolens Lachnoanaerobaculum|orale Pre votella|veroralis*6 Fusobacter ium|nucleatum_subsp._vincentii*6 Rothia|aer ia*1 Actinomyces|lingnae*1 Actinomyces|sp ._HMT_180*2 Gemella|morbillorum*1 Actinomyces|lingnae*2 Gemella|morbillorum*3 ASV28: Fusobacter ium|nucleatum_subsp._vincentii*9 Clostridiales_[F−3][G−1]|bacterium_HMT_876 Streptococcus|*9 Catonella|morbi Fusobacter ium|nucleatum_subsp._vincentii*2 Saccharibacter ia_(TM7)_[G−1]|bacter ium_HMT_347 Fusobacter ium|nucleatum_subsp._vincentii*19 The copyrightholderforthispreprint Fusobacter ium|nucleatum_subsp._vincentii*5 Veillonella|parvula*20 Veillonella|parvula*19 Veillonella|parvula*17 Veillonella|parvula*16 Streptococcus|saliv ar ius*2 Streptococcus|*10 Sacchar ibacteria_(TM7)_[G−3]|bacter ium_HMT_351 Or ibacter ium|par vum*2 Neisseria|bacillifor mi s Neisseria|*2 La wsonella|clev elandensis*2 Lachnospiraceae_[G−2]|bacter ium_HMT_096 Clostridium|perfr ingens Atopobium|r imae Actinomyces|sp._HMT_180*1 Actinomyces|sp._HMT_169*1 Actinomyces|grae v enitzii*3 Abiotrophia|defectiv a Acinetobacter|lwoffii*1 Enterococcus|*3 Leptothr ix|sp._HMT_266 Staph ylococcus|capitis*1 Kluyv era|ascorbata*3 ASV19: Streptococcus|parasanguinis_clade_411*6 Acinetobacter|lwoffii*3 Alloprev otella|ra v a*4 Dialister|invisus*2 Dialister|invisus Fusobacter ium|nucleatum_subsp._vincentii*1 Desulfovibrio|f airfieldensis*4 Peptostreptococcus|stomatis Megasphaera|micronucifor mis*4 Megasphaera|sp._HMT_123 Oribacter ium|sp ._HMT_078 Alloprev otella|ra v a*6 Neisseriaceae_[G−1]|bacter ium_HMT_327 Streptococcus|parasanguinis_clade_411*9 V eillonella|par vula*6 Acinetobacter|lwoffii*2 Alloprev otella|ra v a*1 Fusobacter ium|nucleatum_subsp._vincentii*12 Alloprev otella|ra v a*2 Klebsiella|pneumoniae*1 Porphyromonas|pasteri*3 Veillonella|parvula*22 Porphyromonas|pasteri*4 Veillonella|parvula*14 Gemella|morbillorum*4 Granulicatella|adiacens*2 Fusobacter ium|nucleatum_subsp._vincentii*14 Actinomyces|lingnae*4 Porphyromonas|paster i*2 Ruminococcaceae_[G−1]|bacter ium_HMT_075 Pa r vimonas|micra*3 Granulicatella|elegans ASV21: Streptococcus|*4 Rothia| Rothia|aer ia*3 Solobacter ium|moorei*1 Actinomyces|sp._HMT_180*4 Streptococcus|*1 Saccharibacteria_(TM7)_[G−1]|bacterium_HMT_352*2 Peptostreptococcaceae_[XI][G−1]|sulci Solobacter ium|moorei*2 Gemella|morbillorum*5 Streptococcus|parasanguinis_clade_411*4 Pre v otella|veroralis*11 Pre v otella|*4 Stomatobaculum|sp._HMT_097 Oribacterium|par vum*1 Streptococcus|parasanguinis_clade_411*5 Rothia|aer ia*5 Staph ylococcus|capitis*4 medRxiv preprint doi: https://doi.org/10.1101/2020.01.30.20019752; this version posted February 3, 2020. The copyright holder for this preprint 7a. Pancreatic(which wastumor not certified co-abundance by peer review) networkis the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission. Height 7b. 0.0 1.5

Campylobacter|rectus

Prevotella|nigrescens*2 medRxiv preprint Dialister|pneumosintes Parvimonas|micra*2 Fusobacter ium|nucleatum_subsp ._vincentii*2 (which wasnotcertifiedbypeerreview)istheauthor/funder,whohasgrantedmedRxivalicensetodisplaypreprintinperpetuity. Klebsiella|pneumoniae*2 Klebsiella|pneumoniae*4 Klebsiella|pneumoniae*14 Desulf ovibr io|f airfieldensis*4 Enterococcus|*3 Acinetobacter|Iwoffii*2 Acinetobacter|lwoffii*1 doi: Klebsiella|pneumoniae*7

Acinetobacter|*3 https://doi.org/10.1101/2020.01.30.20019752 Sphingomonas| yabuuchiae Enterobacter iaceae(F)|*1 Cor ynebacter ium(G)| Streptococcus|mutans Lactococcus|lactis Staphylococcus|capitis*4 Staphylococcus|capitis*3 Ralstonia|sp ._HMT_406 Gemella |morbillorum*6 Streptococcus|*2 Lawsonella|cle velandensis*2 All rightsreserved.Noreuseallowedwithoutpermission.

Staphylococcus|capitis*1 PancreaticTumor Cluster Dendrogram Stenotrophomonas|maltophilia Streptococcus|saliv arius*3 Neisseriaceae_[G−1]|bacter ium_HMT_327 Klebsiella|pneumoniae*12 Klebsiella|pneumoniae*10 Enterobacter iaceae(F)|*3 Clostr idium|perfr ingens Dialister|in visus Cor ynebacter ium|kroppenstedtii Acinetobacter|*2 Acinetobacter|*1 Pseudomonas|

Bacteroides|hepar inolyticus*5 ; Bacteroides|hepar inolyticus*3 this versionpostedFebruary3,2020. Lachnoanaerobaculum|* 2 Morax ella|osloensis Lachnoanaerobaculum|*3 Bacteroides|hepar inolyticus*1 Lachnospiraceae_[XIV](F)| Campylobacter|*1 Fusobacter ium|nucleatum_subsp ._vincentii*6 Peptostreptococcus|stomatis Rothia| Bacteroides|hepar inolyticus*2 Streptococcus|*3 Rothia|aer ia*5 Streptococcus|saliv arius*1 Atopobium | Campylobacter|cur vus Streptococcus|parasanguinis_clade_411*7 Novosphingobium|panipatense Haemophilus|parainfluenzae*1 Veillonel la|par vula*2 Haemophilus|parainfluenzae*3 Granulicatella|adiacens The copyrightholderforthispreprint Rothia|aer ia*2 Streptococcus|parasanguinis_clade_411*9 Veillonella |parvula*6 Parvimonas|micra*1 Veillonella |par vula*5