bioRxiv preprint doi: https://doi.org/10.1101/2020.03.10.986646; this version posted September 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Microbial function and genital inflammation in young South African women at high risk of HIV

2 infection

3

4 Arghavan Alisoltani1,2, Monalisa T. Manhanzva1, Matthys Potgieter3,4, Christina Balle5, Liam Bell6,

5 Elizabeth Ross6, Arash Iranzadeh3, Michelle du Plessis6, Nina Radzey1, Zac McDonald6, Bridget

6 Calder4, Imane Allali3,7, Nicola Mulder3,8,9, Smritee Dabee1,10, Shaun Barnabas1, Hoyam Gamieldien1,

7 Adam Godzik2, Jonathan M. Blackburn4,8, David L. Tabb8,11,12, Linda-Gail Bekker8,13, Heather B.

8 Jaspan5,8,10, Jo-Ann S. Passmore1,8,14,15, Lindi Masson1,8,14,16, 17*

9

10 1Division of Medical Virology, Department of Pathology, University of Cape Town, Cape Town 7925, South

11 Africa; 2Division of Biomedical Sciences, University of California Riverside School of Medicine, Riverside, CA

12 92521, USA; 3Computational Biology Division, Department of Integrative Biomedical Sciences, University of Cape

13 Town, Cape Town 7925, South Africa; 4Division of Chemical and Systems Biology, Department of Integrative

14 Biomedical Sciences, University of Cape Town, Cape Town 7925, South Africa. 5Division of Immunology,

15 Department of Pathology, University of Cape Town, Cape Town 7925, South Africa; 6Centre for Proteomic and

16 Genomic Research, Cape Town 7925, South Africa; 7Laboratory of Human Pathologies Biology, Department of

17 Biology and Genomic Center of Human Pathologies, Mohammed V University in Rabat, Morocco; 8Institute of

18 Infectious Disease and Molecular Medicine (IDM), University of Cape Town, Cape Town 7925, South Africa;

19 9Centre for Infectious Diseases Research (CIDRI) in Africa Wellcome Trust Centre, University of Cape Town,

20 Cape Town 7925, South Africa; 10Seattle Children’s Research Institute, University of Washington, WA 98101,

21 USA; 11Bioinformatics Unit, South African Tuberculosis Bioinformatics Initiative, Stellenbosch University,

22 Stellenbosch 7602, South Africa; 12DST–NRF Centre of Excellence for Biomedical Tuberculosis Research,

23 Stellenbosch University, Stellenbosch 7602, South Africa; 13Desmond Tutu HIV Centre, Cape Town 7925,

24 University of Cape Town, South Africa; 14Centre for the AIDS Programme of Research in South Africa, Durban

25 4013, South Africa; 15National Health Laboratory Service, Cape Town 7925, South Africa; 16Disease Elimination

26 Program, Life Sciences Discipline, Burnet Institute, Melbourne 3004, Australia; 17Central Clinical School, Monash

27 University, Melbourne 3004, Australia

28

29 *Corresponding author: Dr Lindi Masson; 85 Commercial Road, Melbourne, Victoria, Australia 3004; GPO Box

30 2284, Melbourne, Victoria, Australia 3001; Email: [email protected]

1 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.10.986646; this version posted September 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Abstract

2 Background: Female genital tract (FGT) inflammation is an important risk factor for HIV acquisition.

3 The FGT microbiome is closely associated with inflammatory profile, however, the relative importance

4 of microbial activities has not been established. Since proteins are key elements representing actual

5 microbial functions, this study utilized metaproteomics to evaluate the relationship between FGT

6 microbial function and inflammation in 113 young and adolescent South African women at high risk of

7 HIV infection. Women were grouped as having low, medium or high FGT inflammation by K-means

8 clustering according to pro-inflammatory cytokine concentrations.

9 Results: A total of 3,186 microbial and human proteins were identified in lateral vaginal wall swabs

10 using liquid chromatography-tandem mass spectrometry, while 94 microbial taxa were included in the

11 taxonomic analysis. Both metaproteomics and 16S rRNA gene sequencing analyses showed

12 increased non-optimal and decreased lactobacilli in women with FGT inflammatory profiles.

13 However, differences in the predicted relative abundance of most bacteria were observed between

14 16S rRNA gene sequencing and metaproteomics analyses. Bacterial protein functional annotations

15 (gene ontology) predicted inflammatory cytokine profiles more accurately than bacterial relative

16 abundance determined by 16S rRNA gene sequence analysis, as well as functional predictions based

17 on 16S rRNA gene sequence data (p<0.0001). The majority of microbial biological processes were

18 underrepresented in women with high inflammation compared to those with low inflammation,

19 including a -associated signature of reduced cell wall organization and peptidoglycan

20 biosynthesis. This signature remained associated with high FGT inflammation in a subset of 74

21 women nine weeks later, was upheld after adjusting for Lactobacillus relative abundance, and was

22 associated with in vitro inflammatory cytokine responses to Lactobacillus isolates from the same

23 women. Reduced cell wall organization and peptidoglycan biosynthesis were also associated with

24 high FGT inflammation in an independent sample of ten women.

25 Conclusions: Both the presence of specific microbial taxa in the FGT and their properties and

26 activities are critical determinants of FGT inflammation. Our findings support those of previous studies

27 suggesting that peptidoglycan is directly immunosuppressive, and identify a possible avenue for

28 biotherapeutic development to reduce inflammation in the FGT. To facilitate further investigations of

29 microbial activities, we have developed the FGT-METAP application that is available at

30 (http://immunodb.org/FGTMetap/).

2 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.10.986646; this version posted September 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Keywords

2 Metaproteomics; microbiome; microbial function; female genital tract; inflammation; cytokine

3 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.10.986646; this version posted September 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Background

2 Despite the large reduction in AIDS-related deaths as a result of the expansion of HIV antiretroviral

3 programs, global HIV incidence has declined by only 16% since 2010 [1]. Of particular concern are

4 the extremely high rates of HIV infection in young South African women [1]. The behavioral and

5 biological factors underlying this increased risk are not fully understood, however, a key predisposing

6 factor that has been identified in this population is genital inflammation [2–4]. We have previously

7 shown that women with elevated concentrations of pro-inflammatory cytokines and chemokines in

8 their genital tracts were at higher risk of becoming infected with HIV [2]. Furthermore, the protection

9 offered by a topical antiretroviral tenofovir microbicide was compromised in women with evidence of

10 genital inflammation [3].

11

12 The mechanisms by which female genital tract (FGT) inflammation increases HIV risk are likely

13 multifactorial, and increased genital inflammatory cytokine concentrations may facilitate the

14 establishment of a productive HIV infection by recruiting and activating HIV target cells, directly

15 promoting HIV transcription and reducing epithelial barrier integrity [5–7]. Utilizing proteomics, Arnold,

16 et al. showed that proteins associated with tissue remodeling processes were up-regulated in the

17 FGT, while protein biomarkers of mucosal barrier integrity were down-regulated in individuals with an

18 inflammatory profile [7]. This suggests that inflammation increases tissue remodeling at the expense

19 of mucosal barrier integrity, which may in turn increase HIV acquisition risk [7]. Furthermore,

20 neutrophil-associated proteins, especially certain proteases, were positively associated with

21 inflammation and may be involved in disrupting the mucosal barrier [7]. This theory was further

22 validated by the finding that antiproteases associated with HIV resistance were increased in a cohort

23 of sex workers from Kenya [8].

24

25 (BV), non-optimal cervicovaginal bacteria, and sexually transmitted infections

26 (STIs), which are highly prevalent in South African women, are likely important drivers of genital

27 inflammation in this population [9,10]. BV and non-optimal bacteria have been consistently associated

28 with a marked increase in pro-inflammatory cytokine concentrations [9–12]. Conversely, women who

29 have “optimal” vaginal microbiota, primarily consisting of Lactobacillus spp., have low levels of genital

30 inflammation and a reduced risk of acquiring HIV [13]. Lactobacilli and the lactic acid that they

4 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.10.986646; this version posted September 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 produce may actively suppress inflammatory responses and thus may play a significant role in

2 modulating immune profiles in the FGT [14–16]. However, partly due to the complexity and diversity of

3 the microbiome, the immunomodulatory mechanisms of specific vaginal bacterial species are not fully

4 understood [17]. Further adding to this complexity is the fact that substantial differences in the

5 cervicovaginal microbiota exist by geographical location and ethnicity and that the properties of

6 different strains within particular microbial species are also highly variable [15,18].

7

8 In addition to providing insight into host mucosal barrier function, metaproteomic studies of the FGT

9 can evaluate microbial activities and functions that may influence inflammatory profiles [19]. A recent

10 study highlighted the importance of vaginal microbial function by demonstrating that FGT bacteria,

11 such as Gardnerella vaginalis, modulate the efficacy of the tenofovir microbicide by active metabolism

12 of the drug [20]. Additionally, Zevin et al. found that bacterial functional profiles were associated with

13 epithelial barrier integrity and wound healing in the FGT [21], suggesting that microbial function may

14 also be closely linked to genital inflammation. The aim of the present study was to utilize both

15 metaproteomics and 16S rRNA gene sequencing to improve our understanding of the relationship

16 between microbial function and inflammatory profiles in the FGTs of South African women.

17

18

5 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.10.986646; this version posted September 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Results

2 This study included 113 young and adolescent HIV-uninfected women (aged 16-22 years) residing in

3 Cape Town, South Africa [22]. Seventy-four of these women had samples and metadata available for

4 analysis at a second time-point nine weeks later. Ten women had samples available at the second

5 time-point, but not the first, and were included to validate the functional signature identified in the

6 primary analysis of this study. STI and BV prevalence were high in this cohort, with 48/113 (42%) of

7 the women having at least one STI and 56/111 (50%) of the women being BV positive by Nugent

8 scoring (Additional file 1: Table S1). The use of injectable contraceptives, BV, and chlamydia were

9 associated with increased genital inflammatory cytokines, as previously described [10,23].

10

11 FGT metaproteome associates with BV and inflammatory cytokine profiles

12 Liquid chromatography-tandem mass spectrometry (LC-MS/MS) analysis was conducted on the total

13 protein acquired from lateral vaginal wall swabs for all participants. Using principal component

14 analysis (PCA), it was found that women with BV clustered separately from women who were BV

15 negative according to the relative abundance of all host and microbial proteins identified (Additional

16 file 1: Fig. S1a). Women who were considered to have high levels of genital inflammation based on

17 pro-inflammatory cytokine concentrations (Additional file 1: Fig. S2) tended to cluster separately from

18 women with low inflammation, while no clear grouping was observed by STI status or chemokine

19 profiles (Additional file 1: Fig. S1b-d). After adjusting for potentially confounding variables including

20 age, contraceptives, prostate specific antigen (PSA) and co-infections, BV was associated with the

21 largest number of differentially abundant proteins, followed by inflammatory cytokine profiles, while

22 fewer associations were observed between chemokine profiles and protein relative abundance and

23 none between STI status and protein relative abundance (Additional file 1: Fig. S1e). Individual STIs

24 were also not associated with marked changes in the metaproteome, with Chlamydia trachomatis,

25 Neisseria gonorrhoeae, Trichomonas vaginalis and Mycoplasma genitalium associated with

26 significant changes in zero, five, seven and eleven proteins, respectively (after adjustment for

27 potential confounders and multiple comparisons). However, this stratified analysis is limited by the

28 small number of STI cases for some infections and the prevalence of co-infections.

29

30

6 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.10.986646; this version posted September 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Differences in taxonomic assignment using metaproteomics and 16S rRNA gene sequencing

2 Of the proteins identified in lateral vaginal wall swab samples, 38.8% were human, 55.8% were

3 bacterial, 2.7% fungal, 0.4% archaeal, 0.09% of viral origin, and 2% grouped as “other” (not shown).

4 However, as the majority of taxa identified had <2 proteins detected, a more stringent cut-off was

5 applied to include only taxa with >3 detected proteins, or 2 proteins detected in multiple taxa. The final

6 curated dataset included 44% human, 55% bacterial (n=81 taxa), and 1% fungal proteins (n=13 taxa)

7 (Fig. 1a-d; Additional file 2: Table S2). When the relative abundance of the most abundant bacterial

8 taxa identified using metaproteomics was compared to the relative abundance of the most abundant

9 bacteria identified using 16S rRNA gene sequencing, a large degree of similarity was found at the

10 genus level. A total of 6/9 of the genera identified using 16S rRNA gene sequence analysis were also

11 identified using metaproteomics (Fig. 1e, f; Additional file 1: Fig. S3). As expected, both approaches

12 showed that multiple Lactobacillus species were less abundant in women with high versus low

13 inflammation, while BV-associated bacteria (including G. vaginalis, Prevotella species, Megasphaera

14 species, Sneathia amnii and Atopobium vaginae) were more abundant in women with high

15 inflammation compared to those with low inflammation (Additional file 2: Table S3). However,

16 species-level annotation differed between the 16S rRNA gene sequencing and metaproteomics

17 analyses, with metaproteomics identifying more lactobacilli at the species level, while

18 Lachnovaginosum genomospecies (previously known as BVAB1) was not identified using

19 metaproteomics. Additionally, the relative abundance of the taxa differed between 16S rRNA gene

20 sequencing and metaproteomics analyses.

21

22 Protein profiles differ between women defined as having high, medium or low FGT

23 inflammation

24 A total of 449, 165, and 39 host and microbial proteins were differentially abundant between women

25 with high versus low inflammation, medium versus high inflammation and medium versus low

26 inflammation, respectively (Additional file 2: Table S4). Microbial proteins that were more abundant in

27 women with high or medium inflammation versus low inflammation were mostly assigned to non-

28 optimal bacterial taxa (including G. vaginalis, Prevotella species, Megasphaera species, S. amnii and

29 A. vaginae). Less abundant microbial proteins in women with high inflammation were primarily of

30 Lactobacillus origin (Additional file 1: Fig. S4). Of the human proteins that were differentially abundant

7 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.10.986646; this version posted September 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 according to inflammation status, 93 were less abundant and 132 were more abundant in women with

2 high compared to those with low inflammation. Human biological process gene ontologies (GOs) that

3 were significantly underrepresented in women with high versus low inflammation included positive

4 regulation of apoptotic signaling, establishment of endothelial intestinal barrier, ectoderm

5 development and cornification [false discovery rate (FDR) adj. p<0.0001 for all; Additional file 1: Fig.

6 S5]. Significantly overrepresented pathways in women with high versus low inflammation included

7 multiple inflammatory processes - such as chronic response to antigenic stimulus, positive regulation

8 of IL-6 production and inflammatory response (FDR adj. p<0.0001 for all; Additional file 1: Fig. S5).

9

10 Weighted correlation network analysis of microbial and host proteins identified five modules (clusters)

11 representing co-correlations between microbial and host proteins (Fig. 2a, b). The yellow module

12 included primarily L. iners proteins, while the turquoise module primarily consisted of L. crispatus and

13 host proteins. The grey and brown modules consisted entirely of host proteins and the blue module

14 included non-optimal bacteria and host proteins. Pro-inflammatory cytokines correlated inversely with

15 the Lactobacillus modules (yellow and turquoise) and positively with the grey, brown and blue

16 modules. Chemokines were significantly positively correlated with the brown and grey modules and

17 inversely with the turquoise module (Fig. 2c). Blue, yellow, turquoise and grey modules correlated

18 with BV Nugent score, while BV showed no significant correlation with the brown module (Fig. 2d).

19 The finding that the brown module did not include any microbial proteins and was not associated with

20 BV or STIs, suggests the presence of inflammatory processes independent of the microbiota. For

21 STIs, no significant correlations (p-value <0.05) with any of the five modules were detected (Fig. 2d).

22 To determine whether the co-correlations of microbial proteins with host proteins had functional

23 meaning, we profiled the top biological processes of proteins as shown in the row sidebar in Fig. 2a.

24 Host proteins that correlated negatively with the blue module (consisting of non-optimal bacteria) were

25 mostly involved in cell adhesion, cell-cell adhesion, cell-cell junction assembly, and regulation of cell-

26 adhesion pathways. This suggests reduced epithelial barrier function associated with G. vaginalis,

27 Prevotella spp., Megasphaera, and A. vaginae. Host proteins involved in immune system and

28 inflammatory response pathways were positively correlated with brown and grey modules and

29 inversely associated with the turquoise module including L. crispatus. Despite variability in the

30 microbial proteins and taxa, redundant microbial functional pathways were identified across different

8 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.10.986646; this version posted September 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 modules, indicating that microbiota share a set of common metabolic functions (e.g. glucose

2 metabolism, protein translation and synthesis), as expected. Although most lactobacilli proteins were

3 negatively associated with BV and pro-inflammatory cytokines (see turquoise and yellow modules in

4 Fig. 2a), almost all L. iners proteins were clustered in the yellow module with slightly different

5 functional pathways from lactobacilli proteins in the turquoise module.

6

7 Microbial function predicts genital inflammation with greater accuracy than taxa relative

8 abundance

9 The accuracies of (i) microbial relative abundance (determined using 16S rRNA gene sequencing), (ii)

10 functional prediction based on 16S rRNA gene sequence data, (iii) microbial protein relative

11 abundance (determined using metaproteomics) and (iv) microbial molecular function (determined by

12 aggregation of GOs of proteins identified using metaproteomics) for prediction of genital inflammation

13 status (low, medium and high groups) were evaluated using random forest analysis. The three most

14 important species predicting genital inflammation grouping by 16S rRNA gene sequence data were

15 Sneathia sanguinegens, A. vaginae, and (Fig. 3a). The top functional pathways

16 based on 16S rRNA gene sequence data for the identification of women with inflammation were

17 prolactin signaling, biofilm formation, and tyrosine metabolism (Fig. 3b). Similarly, the most important

18 proteins predicting inflammation grouping by metaproteomics included two L. iners proteins

19 (chaperone protein DnaK and pyrophosphate phospho-hydrolase), two A. vaginae proteins

20 (glyceraldehyde-3-phosphate dehydrogenase and phosphoglycerate kinase), Megasphaera

21 genomosp. type 1 cold-shock DNA-binding domain protein, and L. crispatus ATP synthase subunit

22 alpha (Fig. 3c). The top molecular functions associated with genital inflammation included

23 oxidoreductase activity (of multiple proteins expressed primarily by lactobacilli, but also Prevotella and

24 Atopobium species), beta-phosphoglucomutase activity (expressed by lactobacilli) and GTPase

25 activity (of multiple proteins produced by mainly Prevotella, G. vaginalis, and some lactobacilli) as

26 illustrated in Fig. 3d. These findings suggest that the functions of key BV-associated bacteria and

27 lactobacilli are critical for determining the level of genital inflammation. It was also found that the out-

28 of-bag (OOB) error rate distribution was significantly lower for molecular function, followed by protein

29 relative abundance, functional prediction based on 16S rRNA gene sequence data and then taxa

30 relative abundance (Fig. 3e). This suggests that the activities of the bacteria present in the FGT play a

9 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.10.986646; this version posted September 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 critical role in driving genital inflammation, in addition to the presence and relative abundance of

2 particular taxa.

3

4 Differences in microbial stress response, lactate dehydrogenase and metabolic pathways

5 between women with high versus low FGT inflammation

6 Microbial functional profiles were further investigated by comparing molecular functions, biological

7 processes and cellular component GO enrichment for detected proteins between women with low

8 versus high inflammation using differential expression analysis, adjusting for potentially confounding

9 variables including age, contraceptives, PSA and STIs (Fig. 4; Additional file 2: Table S5). The

10 majority of microbial molecular functions (90/107), biological processes (57/83) and cellular

11 components (14/23) were underrepresented in women with high compared to low genital

12 inflammation, with most of the differentially abundant metabolic processes, including peptide,

13 nucleoside, glutamine, and glycerophospholipid metabolic processes, decreased in women with

14 inflammation. The phosphoenolpyruvate-dependent sugar phosphotransferase system (a major

15 microbial carbohydrate uptake system including proteins assigned to Lactobacillus species, S. amniii,

16 A. vaginae, and G. vaginalis) was also underrepresented in women with evidence of genital

17 inflammation compared to women with low inflammation (FDR adj. p<0.0001; Fig. 4a). Additionally, L-

18 lactate dehydrogenase activity was inversely associated with inflammation (Additional file 2: Table S5)

19 and both large and small ribosomal subunit cellular components were underrepresented in those with

20 high FGT inflammation (Fig. 4b). The only metabolic processes that were overabundant in women

21 with high versus low inflammation were malate and ATP metabolic processes. Overrepresented

22 microbial biological processes in women with genital inflammation also included response to oxidative

23 stress and cellular components associated with cell division (e.g. mitotic spindle midzone) and

24 ubiquitination (e.g. Doa10p ubiquitin ligase complex, Hrd1p ubiquitin ligase ERAD-L complex, RQC

25 complex; Fig. 4).

26

27 Underrepresented Lactobacillus peptidoglycan, cell wall and membrane pathways in women

28 with high versus low FGT inflammation

29 Among the top biological processes inversely associated with inflammation were cell wall

30 organization, regulation of cell shape and peptidoglycan biosynthetic process (all adj. p<0.0001; Fig.

10 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.10.986646; this version posted September 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 4a). The GO terms, regulation of cell shape, and peptidoglycan biosynthetic process included

2 identical proteins (e.g. D-alanylalanine synthetase, D-alanyl-D-alanine-adding enzyme and

3 Bifunctional protein GlmU), while cell wall organization included the same proteins plus D-alanyl

4 carrier protein. Although proteins involved in these processes were only detected for Lactobacillus

5 species and may thus be biomarkers of the increased relative abundance of lactobacilli, each

6 remained significantly associated with inflammation after adjusting for the relative abundance of L.

7 iners and non-iners Lactobacillus species (determined using 16S rRNA gene sequencing), as well as

8 STI status, PSA detection and hormonal contraceptive use [beta-coefficient: -1.68; 95% confidence

9 interval (CI): -2.84 to -0.52; p=0.004 for cell wall organization; beta-coefficient: -1.71; 95% CI: -2.91 to

10 -0.53; p=0.005 for both regulation of cell shape and peptidoglycan biosynthetic process]. This

11 suggests that the relationship between these processes and inflammation is independent of the

12 relative abundance of Lactobacillus species and that the cell wall and membrane properties of

13 lactobacilli may play a role in modulating FGT inflammation. Furthermore, we found that multiple cell

14 wall and membrane cellular components, including the S-layer, peptidoglycan-based cell wall, cell

15 wall and membrane, were similarly underrepresented in women with high versus low FGT

16 inflammation (Fig. 4b). Using metaproteomic data from an independent group of 10 women, we

17 similarly found that peptidoglycan biosynthetic process, cell wall organization, and regulation of cell

18 shape GOs were underrepresented in women with high inflammation (Fig. 5a).

19

20 To further investigate whether the associations between FGT inflammation and Lactobacillus

21 biological processes and cellular components were independent of Lactobacillus relative abundance,

22 we compared these GOs in BV negative women with Lactobacillus dominant communities who were

23 grouped (median splitting of samples based on the first principal component of nine pro-inflammatory

24 cytokine concentrations) as having low (n=20) versus high (n=20) inflammation (Fig. S6).

25 Interestingly, it was found that, even though only non-significant trends towards decreased relative

26 abundance of lactobacilli determined by 16S rRNA gene sequencing (p=0.73) and metaproteomics

27 (p=0.14) were observed, several previously identified GOs, including peptidoglycan-based cell wall

28 (p=0.01), cell wall organization (p=0.03), and S-layer (p=0.01), remained associated with the level of

29 inflammation (Fig. S6). This suggests an association between these biological processes and cellular

30 components that is not fully explained by the relative abundance of the lactobacilli themselves.

11 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.10.986646; this version posted September 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1

2 In vitro confirmation of the role of Lactobacillus cell wall and membrane pathways in

3 regulating genital inflammation

4 As previous studies have suggested that the cell wall, as well as the cell membrane, may influence

5 the immunomodulatory properties of Lactobacillus species and that peptidoglycan may be directly

6 immunosuppressive [14,24], we further investigated the relationships between Lactobacillus cell wall

7 and membrane properties and inflammation using 64 Lactobacillus species that were isolated from

8 the same women [25]. From these lactobacilli, the 22 isolates that induced the lowest levels of pro-

9 inflammatory cytokine production by vaginal epithelial cells and the 22 isolates that induced the

10 greatest cytokine responses were selected and their protein profiles evaluated using LC-MS/MS. Cell

11 wall organization, regulation of cell shape, and peptidoglycan biosynthetic process that were

12 associated with genital inflammation in the FGT metaproteomic analysis of vaginal swab samples

13 were also significantly overrepresented in isolates that induced low compared to high levels of

14 inflammatory cytokines in vitro (Fig. 5b-d). Furthermore, plasma membrane, cell wall and membrane

15 cellular components were similarly overrepresented in isolates that induced low versus high levels of

16 inflammation (Fig. 5e-i). The cell wall cellular component included multiple proteins involved in

17 adhesion (adhesion exoprotein, LPXTG-motif cell wall anchor domain protein, collagen-binding

18 protein, mannose-specific adhesin, putative mucin binding protein, sortase-anchored surface protein

19 and surface anchor protein). The relative abundance of this cell wall cellular component correlated

20 positively with the level of adhesion of the isolates to vaginal epithelial cells in vitro (rho=0.30,

21 p=0.0476), which in turn correlated inversely with in vitro inflammatory cytokine production by these

22 cells in response to the isolates [25]. Importantly, the cell wall organization biological process included

23 multiple proteins involved in peptidoglycan biosynthesis, which has the potential to dampen immune

24 responses in a strain-dependent manner [14,26].

25

26 Changes in FGT metaproteomic profiles over time

27 To evaluate variations in FGT metaproteomic profiles associated with inflammatory profiles over time,

28 we compared proteins, taxa and biological process and cellular component GOs between two time-

29 points nine weeks apart (n=74). The grouping according to inflammatory cytokine profile was

30 consistent at both visits for 41/74 (55%) participants, including 7/12 (58%) women who received

12 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.10.986646; this version posted September 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 antibiotic treatment between visits (Fig. 6a). The findings based on the second visit also confirmed the

2 role of inflammation as one of the drivers of variation in complex metaproteome data, and similar

3 patterns of data distribution according to inflammation status were observed at both visits (Fig. 6b).

4 When the proteins, taxa and functions that were most closely associated with inflammation grouping

5 (high vs. low) at the first visit, were evaluated at the second visit, overall profiles remained similar

6 between visits (Fig. 6c-e). Significantly lower relative abundance of Lactobacillus species and

7 increased non-optimal bacteria were observed in women with inflammation at the second visit (Fig.

8 6d). Similarly, cell wall organization, regulation of cell shape, and peptidoglycan biosynthetic process

9 were among the top biological processes associated with low versus high inflammation at both visits

10 (for other top GO terms see Fig. 6e). Among women assigned to the same inflammatory profile group

11 at both visits, the relative abundance of cell wall-related GO terms, malate metabolic process and

12 response to oxidative stress were highly consistent (Additional file 1: Fig. S7a-c). However, decreased

13 levels of inflammation between visits were associated with an increase in the relative abundance of

14 cell wall GOs and decreased response to oxidative stress and malate metabolic process GOs

15 (Additional file 1: Fig. S7d-f). On the other hand, increased levels of inflammation were associated

16 with decreased relative abundance of cell wall GOs and increased response to oxidative stress and

17 malate metabolic process GOs (Additional file1: Fig. S7g-i).

18

13 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.10.986646; this version posted September 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Discussion

2 Understanding biological risk factors for HIV acquisition is critical for the development of effective HIV

3 prevention strategies. FGT inflammation, defined by elevated inflammatory cytokine levels, has been

4 identified as a key factor associated with HIV infection risk. However, the causes and underlying

5 mechanisms are not fully understood. In this study, we demonstrate that, while the presence of

6 particular microorganisms, including non-optimal bacteria and Lactobacillus species, is important for

7 modulating FGT inflammation, their activities and functions are likely of greater importance. We

8 showed that the molecular functions of bacterial proteins predicted FGT inflammation status more

9 accurately than taxa relative abundance determined using 16S rRNA gene sequencing, as well as

10 functional predictions based on 16S rRNA gene sequencing analysis. The majority of microbial

11 biological processes were underrepresented in women with high compared to low inflammation

12 (57/83), including metabolic pathways and a strong signature of reduced Lactobacillus-associated cell

13 wall organization and peptidoglycan biosynthesis. This signature remained associated with FGT

14 inflammation after adjusting for the relative abundance of lactobacilli, as well as other potential

15 confounders (such as hormone contraceptive use, semen exposure, STIs), and was also associated

16 with inflammatory cytokine induction in vaginal epithelial cells in response to Lactobacillus isolates in

17 vitro. We observed a high level of consistency between FGT metaproteomics profiles of the same

18 women at two time points nine weeks apart, particularly among those with the same level of

19 inflammation at both visits. However, cell wall organization and peptidoglycan biosynthesis increased

20 between visits in women who moved from a higher to a lower inflammation grouping and decreased in

21 women who moved from a lower to a higher inflammation grouping.

22

23 While we found a large degree of similarity between metaproteomics and 16S rRNA gene sequence

24 taxonomic assignment determined for the same women [27], important differences in the predicted

25 relative abundance of most bacteria were observed. Differences in the relative abundance of certain

26 taxa observed using metaproteomics and 16S rRNA gene sequence data is not surprising as

27 taxonomic assignment from metaproteomic approaches also indicates functional activity (measured

28 by the amount of expressed protein), rather than simply the abundance of a particular bacterial taxon.

29 Additionally, others have noted that 16S rRNA gene sequencing does not have the ability to

30 distinguish between live bacteria and transient DNA [28]. We also observed differences that are likely

14 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.10.986646; this version posted September 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 due to database limitations. For example, Lachnovaginosum genomospecies (previously known as

2 BVAB1) was not identified in the metaproteomics analysis. This is due to the fact that this species

3 was not present in the UniProt or NCBI databases [29]. However, the Clostridium and Ruminococcus

4 species which were identified are likely to be Lachnovaginosum genomospecies since these taxa fall

5 into the same Clostridiales order. Additionally, Clostridium and Ruminococcus species were more

6 abundant in women with high compared to low inflammation, as expected for

7 Lachnovaginosum genomospecies [10]. Metaproteomic analysis was able to identify lactobacilli to

8 species level, while sequencing of the V4 region of the 16S rRNA gene had more limited resolution for

9 this genus, as expected.

10

11 The finding that fungal protein relative abundance was substantially lower than bacterial protein

12 relative abundance is not surprising as it has been estimated that the total number of fungal cells is

13 orders of magnitude lower than the number of bacterial cells in the human body [30]. Although

14 Candida proteins were detected, the curated dataset (following removal of taxa that had only 1 protein

15 detected or 2 proteins detected in only a single sample) did not include any Candida species. This

16 may be due to the relative rarity of fungal cells in these samples and it would be interesting to

17 evaluate alternative methodology for sample processing to better resolve this population in the future

18 [31]. A limitation of microbial functional analysis using metaproteomics is that, at present, there is

19 generally sparse population of functional information in the databases for the majority of the

20 microbiome. Thus, the findings of this study will be biased toward the microbes that are better

21 annotated. Nonetheless, microbial function was closely associated with genital inflammatory profiles

22 and it is possible that this relationship may be even stronger should better annotation exist. Another

23 limitation of this study is that the in vitro Lactobacillus analysis was conducted using a transformed

24 cell line system that is a greatly simplified model and cannot recapitulate the complexity of the FGT

25 and the microbiome. It is however significant to note that similar proteomic signatures were noted

26 both in vitro and in vivo, regardless of this limitation. To enable researchers to further study

27 associations between microbial composition and function with BV, STIs, and FGT inflammation by

28 conducting detailed analysis of specific proteins, taxa and GO terms of interest, we have developed

29 the “FGT-METAP” online application. The application is available at (http://immunodb.org/FGTMetap/)

15 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.10.986646; this version posted September 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 and allows users to repeat the analyses presented here, as well as obtain additional information by

2 analyzing individual proteins, species, and pathways.

3

4 A significant functional signature associated with FGT inflammation grouping included Lactobacillus-

5 associated cell wall, peptidoglycan and cell membrane biological processes and cellular components.

6 Although these functions were linked exclusively to Lactobacillus species, the associations with FGT

7 inflammation were upheld after adjusting for Lactobacillus relative abundance determined by 16S

8 rRNA gene sequencing and were also evident in an analysis including only women with Lactobacillus-

9 dominant microbiota. This profile was confirmed when these GOs were compared between

10 Lactobacillus isolates that induced high versus low levels of cytokine production in vitro. Previous

11 studies have suggested that the peptidoglycan structure, the proteins present in the cell wall, as well

12 as the cell membrane, may influence the immunomodulatory properties of Lactobacillus species

13 [14,24]. In a murine model, administration of peptidoglycan extracted from gut lactobacilli was able to

14 rescue the mice from colitis [26]. This activity was dependent on the NOD2 pathway and correlated

15 with an upregulation of the indoleamine 2,3-dioxygenase immunosuppressive pathway. The

16 immunomodulatory properties of peptidoglycan were furthermore dependent on the strain of

17 lactobacilli from which the peptidoglycan was isolated [26]. The increase in cell wall organization and

18 peptidoglycan biosynthesis that accompanied a reduction in local inflammation suggests an increase

19 in Lactobacillus strains producing relatively large amounts of proteins involved in these processes. In

20 a previous study, we showed that adhesion of lactobacilli to vaginal epithelial cells in vitro was

21 inversely associated with cytokine responses, suggesting that direct interaction between the isolates

22 and vaginal epithelial cells is critical for immunoregulation [25]. In addition to cell wall and membrane

23 properties, L-lactate dehydrogenase activity was identified as a molecular function inversely

24 associated with inflammation. Similarly, D-lactate dehydrogenase relative abundance and D-lactate

25 production by the Lactobacillus isolates included in this study were also associated with inflammatory

26 profiles in an in vitro analysis [25]. These findings support the results of previous studies showing that

27 lactic acid has immunoregulatory properties [16]. Taken together, these findings suggest that

28 lactobacilli may modulate the inflammatory environment in the FGT through multiple mechanisms. It

29 was further found that L. crispatus was more strongly associated with low inflammation in the

30 differential abundance analysis and that L. iners was the Lactobacillus sp. most frequently detected in

16 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.10.986646; this version posted September 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 women with high inflammation. The co-expression analysis revealed that the majority of L. crispatus

2 proteins grouped separately to L. iners proteins. However, both the L. crispatus and the L. iners

3 modules were strongly inversely associated with inflammatory cytokine and chemokine

4 concentrations. This is interesting since L. crispatus dominance is considered optimal, while the role

5 of L. iners, the most prevalent Lactobacillus species in African women [10,11], is poorly understood

6 and it has been associated with compositional instability and transition to a non-optimal microbiota, as

7 well as increased risk of STI acquisition [32,33].

8

9 In this study, we observed a similar host proteome profile associated with high FGT inflammation

10 compared to previous studies [7]. Multiple inflammatory pathways were overrepresented in women

11 with high versus low FGT inflammation, while signatures of reduced barrier function were observed in

12 women with high inflammation, with underabundant endothelial, ectoderm and tight junction biological

13 processes. These findings similarly suggest that genital inflammation may be associated with

14 epithelial barrier function.

15

16 Conclusions

17 The link between FGT microbial function and local inflammatory responses described in this study

18 suggests that both the presence of specific microbial taxa in the FGT and their properties and

19 activities likely play a critical role in modulating inflammation. Currently, the annotation of microbial

20 functions is sparse, but with ever-increasing amounts of high-quality, high-throughput data, the

21 available information will improve steadily. The findings of the present study contribute to our

22 understanding of the mechanisms by which the microbiota may influence local immunity, and in turn

23 alter the risk of HIV infection. Additionally, the analyses described herein identify specific microbial

24 properties that may be harnessed for biotherapeutic development.

25

17 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.10.986646; this version posted September 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Methods

2 Participants

3 This study included sexually experienced HIV-negative adolescent girls and young women (16-22

4 years) from the Women’s Initiative in Sexual Health (WISH) study in Cape Town, South Africa [22].

5 While the parent study cohort comprised 149 women, the present sub-study included 113 women who

6 had both vaginal swabs available for metaproteomics analysis and menstrual cup (MC) cervicovaginal

7 secretions available for cytokine profiling. Seventy-four of these women had specimens available at a

8 second time-point 9 weeks later (interquartile range: 9-11 weeks) for metaproteomics analysis. Of

9 these women, 12 received antibiotic treatment for laboratory-diagnosed or symptomatic STIs and/or

10 BV (doxycycline, metronidazole, azithromycin, cefixime, ceftriaxone and/or amoxicillin) between study

11 visits, while 2 received topical antifungal treatment (clotrimazole vaginal cream). Ten additional

12 women who did not have visit 1 samples available for analysis were included for validation purposes.

13 The University of Cape Town Faculty of Health Sciences Human Research Ethics Committee (HREC

14 REF: 267/2013) approved this study. Women ≥18 years provided written informed consent, those <18

15 years provided informed assent and informed consent was obtained from parents/guardians.

16

17 Definition of genital inflammation based on cytokine concentrations

18 For cytokine measurement, MCs (Softcup®, Evofem Inc, USA) were inserted by the clinician and kept

19 in place for an hour, after which they were transported to the lab at 4ºC and processed within 4 hours

20 of removal from the participants. MCs were centrifuged and the cervicovaginal secretions were re-

21 suspended in phosphate buffered saline (PBS) at a ratio of 1ml of mucus: 4 ml of PBS and stored at -

22 80ºC until cytokine measurement. Prior to cytokine measurement, MC secretions were pre-filtered

23 using 0.2 μm cellulose acetate filters (Sigma-Aldrich, MO, USA). The concentrations of 48 cytokines

24 were measured in MC samples using Luminex (Bio-Rad Laboratories Inc®, CA, USA) [23]. K-means

25 clustering was used to identify women with low, medium and high pro-inflammatory cytokine

26 [interleukin (IL)-1α, IL-1β, IL-6, IL-12p40, IL-12p70, tumor necrosis factor (TNF)-α, TNF-β, TNF-

27 related apoptosis-inducing ligand (TRAIL), interferon (IFN)-γ] and chemokine profiles [cutaneous T-

28 cell-attracting chemokine (CTACK), eotaxin, growth regulated oncogene (GRO)-α, IL-8, IL-16, IFN-γ-

29 induced protein (IP)-10, monocyte chemoattractant protein (MCP)-1, MCP-3, monokine induced by

18 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.10.986646; this version posted September 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 IFN-γ (MIG), macrophage inflammatory protein (MIP)-1α, MIP-1β, regulated on activation, normal T

2 cell expressed and secreted (RANTES)] (Additional file 1: Fig. S2).

3

4 STI and BV diagnosis and evaluation of semen contamination

5 Vulvovaginal swab samples were screened for common STIs, including Chlamydia trachomatis,

6 Neisseria gonorrhoeae, Trichomonas vaginalis, Mycoplasma genitalium, herpes simplex virus (HSV)

7 types 1 and 2, Treponema pallidum and Haemophilus ducreyi, using real-time multiplex PCR assays

8 [22]. Lateral vaginal wall swabs were collected for BV assessment by Nugent scoring. Blood was

9 collected for HIV rapid testing (Alere Determine™ HIV-1/2 Ag/Ab Combo, Alere, USA). PSA was

10 measured in lateral vaginal wall swabs using Human Kallikrein 3/PSA Quantikine ELISA kits (R&D

11 Systems, MN, USA).

12

13 Discovery metaproteomics and analysis

14 For shotgun LC-MS/MS, lateral vaginal wall swabs were collected, placed in 1 ml PBS, transported to

15 the laboratory at 4ºC and stored at -80°C. Samples were computationally randomized for processing

16 and analysis. The stored swabs were thawed overnight at 4°C before each swab sample was

17 vortexed for 30 seconds, the mucus scraped off the swab on the side of the tube and vortexed for an

18 additional 10 seconds. The samples were then clarified by centrifugation and the protein content of

19 each supernatant was determined using the Quanti-Pro bicinchoninic acid (BCA) assay kit (Sigma-

20 Aldrich, MO, USA). Equal protein amounts (100 μg) were denatured with urea exchange buffer,

21 filtered, reduced with dithiothreitol, alkylated with iodoacetamide and washed with hydroxyethyl

22 piperazineethanesulfonic acid (HEPES). Acetone precipitation/formic acid (FA; Sigma-Aldrich, MO,

23 USA) solubilisation was added as a further sample clean-up procedure. The samples were then

24 incubated overnight with trypsin (Promega, WI, USA) and the peptides were eluted with HEPES and

25 dried via vacuum centrifugation. Reversed-phase liquid chromatography was used for desalting using

26 a step-function gradient. The eluted fractions were dried via vacuum centrifugation and kept at -80°C

27 until analysis. LC-MS/MS analysis was conducted on a Q-Exactive quadrupole-Orbitrap MS (Thermo

28 Fisher Scientific, MA, USA) coupled with a Dionex UltiMate 3000 nano-UPLC system (120min per

29 sample). The solvent system included Solvent A: 0.1% Formic Acid (FA) in LC grade water (Burdick

30 and Jackson, NJ, USA); and Solvent B: 0.1% FA in Acetonitrile (ACN; Burdick & Jackson, NJ, USA).

19 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.10.986646; this version posted September 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 The MS was operated in positive ion mode with a capillary temperature of 320°C, and 1.95 kV

2 electrospray voltage was applied. All 187 samples were evaluated in 11 batches over a period of 3

3 months. A quality control consisting of pooled lateral vaginal swab eluants from the study participants

4 was run at least twice with every batch. Preliminary quality control analysis was conducted, and

5 flagged samples/batches were rerun.

6

7 Due to the important role of database in metaproteomic analysis, we compared two different

8 databases to identify proteins in our dataset: (i) UniProt database restricted to human and microbial

9 entries (73,910,451, release August-2017) and filtered using the MetaNovo pipeline [34]; (ii) vaginal

10 metagenome-based database with human proteins obtained from the study of Afiuni-Zadeh et al. [35].

11 The raw data were processed using MaxQuant version 1.5.7.4 against each database, separately.

12 The detailed parameters are provided in Additional file 1: Table S6. In brief, methionine oxidation and

13 acetylation of the protein N-terminal amino acid were considered as variable modifications and

14 carbamidomethyl (C) as a fixed modification. The digestion enzyme was trypsin with a maximum of

15 two missed cleavages. The taxonomic analysis of the proteins identified using both of the databases

16 was comparable, however, the first database (filtered UniProt human and microbial entries) resulted

17 in the identification of a greater number of proteins compared to the vaginal metagenome database,

18 whilst exhibiting a high degree of similarity compared to the 16S rRNA gene sequence data.

19 Therefore, the data generated using the first database was used for downstream analysis.

20

21 For quality control analysis, raw protein intensities and log10-transformed iBAQ intensities of the

22 quality control pool, as well as the raw protein intensities of the clinical samples, were compared

23 between batches (not shown). As variation in raw intensity was observed, all data were adjusted for

24 batch number prior to downstream analysis. The cleaned and transformed iBAQ values were used to

25 identify significantly differentially abundant proteins according to inflammatory cytokine profiles, BV

26 status or STIs using the limma R package [36]. The p-values were obtained from the moderated t-test

27 after adjusting for confounding variables including age, contraceptives, PSA, and STIs. Proteins with

28 FDR adjusted p-values <0.05 and log2-transformed fold change > 1.2 or < -1.2 were considered as

29 significantly differentially abundant. To define significantly differentially abundant taxa and GOs,

30 aggregated iBAQ values of proteins of the same species or GO IDs were compared between women

20 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.10.986646; this version posted September 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 with low, medium and high inflammation using the limma R package. The aggregation was performed

2 separately for host and microbial proteins. The p-values along with GO IDs were uploaded to

3 REVIGO (http://revigo.irb.hr/) to visualize the top 50 biological processes with the lowest p-values.

4

5 The effects of BV, STIs and chemokine and pro-inflammatory cytokine profiles on the metaproteome

6 were investigated by PCA using the mixOmics R package [37]. The ComplexHeatmap package [38]

7 was used to generate heatmaps and cluster the samples and metaproteomes based on the

8 hierarchical and k-means clustering methods. Additionally, R packages EnhancedVolcano, weighted

9 correlation network analysis (WGCNA) [39], and FlipPlots were applied for plotting volcano plots, co-

10 correlation analysis, and Sankey diagrams, respectively. Basic R functions and the ggplot2 R

11 package were used for data manipulation, transformation, normalization and generation of graphics.

12 To identify the key factors distinguishing women defined as having low, medium and high

13 inflammation in their FGTs, we used random forest analysis using the R package randomForest [40]

14 using the following settings: (i) the type of random forest was classification, (ii) the number of trees

15 was 2000, and (iii) the number of variables evaluated at each split was one-third of the total number of

16 variables [proteins/molecular function GO IDs/16S rRNA gene-based operational taxonomic units

17 (OTUs)]. To compare the accuracy of the protein-, function- and species-based models, we iterated

18 each random forest model 100 times for each of the input datasets separately. The distributions of the

19 OOBs for 100 models for each data type were then compared using t-tests. For the WGCNA analysis,

20 a microbial-host weighted co-correlation network was built using iBAQ values for all proteins and

21 modules were identified using dynamic tree cut. Correlations between proteins and modules were

22 then determined and the top taxa and biological processes assigned to each of the proteins identified.

23 Spearman’s rank correlation was used to determine correlation coefficients between individual

24 cytokines/chemokines and module eigengenes (the first principal component of the expression matrix

25 of the corresponding module) for all samples. Average Spearman’s correlation coefficients for all pro-

26 inflammatory cytokines and chemokines were then determined. Similarly, Spearman’s correlation

27 coefficients were calculated between module eigengenes and Nugent scores and STIs (including C.

28 trachomatis, N. gonorrhoeae, T. vaginalis, M. genitalium and active HSV-2).

29

30

21 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.10.986646; this version posted September 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 16S rRNA gene sequencing and analysis

2 Bacterial 16S rRNA gene sequencing and analysis was conducted as previously described [27].

3 Briefly, the V4 hypervariable region of the 16S rRNA gene was amplified using modified universal

4 primers [41]. Duplicate samples were pooled, purified using Agencourt AMPure XP beads (Beckman

5 Coulter, CA, USA) and quantified using the Qubit dsDNA HS Assay (Life Technologies, CA, USA).

6 Illumina sequencing adapters and dual-index barcodes were added to the purified amplicon products

7 using limited cycle PCR and the Nextera XT Index Kit (Illumina, CA, USA). Amplicons from 96

8 samples and controls were pooled in equimolar amounts and the resultant libraries were purified by

9 gel extraction and quantified (Qiagen, Germany). The libraries were sequenced on an Illumina MiSeq

10 platform (300 bp paired-end with v3 chemistry). Following de-multiplexing, raw reads were

11 preprocessed, merged and filtered using DADA2 [42]. Primer sequences were removed using a

12 custom python script and the reads truncated at 250 bp. Taxonomic annotation was based on a

13 customized version of SILVA reference database. Only samples with ≥ 2000 reads were included in

14 further analyses. The representative sequences for each amplicon sequence variant were BLAST

15 searched against NCBI non-redundant protein and 16S rRNA gene sequence databases for further

16 validation, as well as enrichment of the taxonomic classifications. Microbial functional predictions

17 were performed using Piphillin server [43] against the Kyoto Encyclopedia of Genes and Genomes

18 (KEGG) database (99% identity cutoff). The relative abundance of functional pathways (KEGG level

19 3) was used for RF analysis.

20

21 Lactobacillus isolation

22 Lactobacilli were isolated from cervicovaginal secretions collected using MCs by culturing in de Man

23 Rogosa and Sharpe (MRS) broth for 48 hours at 37°C under anaerobic conditions. The cultures were

24 streaked onto MRS agar plates under the same culture conditions, single colonies were picked and

25 then pre-screened microscopically by Gram staining. Matrix Assisted Laser Desorption Ionization

26 Time of Flight (MALDI-TOF) was conducted to identify the bacteria to species level. A total of 115

27 lactobacilli were isolated and stored at -80°C in a final concentration of 60% glycerol. From these, 64

28 isolates from 25 of the study participants were selected for evaluation of inflammatory profiles.

29

30 Vaginal epithelial cell stimulation and measurement of cytokine concentrations

22 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.10.986646; this version posted September 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 As described previously [25], vaginal epithelial cells (VK2/E6E7 ATCC® CRL-2616™; RRID:CVCL_6471)

2 were maintained in complete keratinocyte serum free media (KSFM) supplemented with 0.4 mM

3 calcium chloride, 0.05 mg/ml of bovine pituitary extract, 0.1 ng/ml human recombinant epithelial

4 growth factor and 50 U/ml penicillin and 50 U/ml streptomycin (Sigma-Aldrich, MO, USA). The VK2

5 cells were seeded into 24-well tissue culture plates, incubated at 37°C in the presence of 5% carbon

6 dioxide and grown to confluency. Lactobacilli, adjusted to 4.18 x 106 colony forming units (CFU)/ml in

7 antibiotic-free KSFM, were used to stimulate VK2 cells for 24 hours at 37°C in the presence of 5%

8 carbon dioxide. As previously described, G. vaginalis ATCC 14018 cultures standardized to 1 x 107

9 CFU/ml in antibiotic-free KSFM were used as a positive control [25]. Production of IL-6, IL-8, IL-1α, IL-

10 1β, IP-10, MIP-3α, MIP-1α, MIP-1β and IL-1RA were measured using a Magnetic Luminex Screening

11 Assay kit (R&D Systems, MN, USA) and a Bio-PlexTM Suspension Array Reader (Bio-Rad

12 Laboratories Inc®, CA, USA). VK2 cell viability following bacterial stimulation was confirmed using the

13 Trypan blue exclusion assay. PCA was used to compare the overall inflammatory profiles of the

14 isolates and to select the 22 most inflammatory and 22 least inflammatory isolates for characterization

15 using proteomics.

16

17 Characterization of Lactobacillus protein expression in vitro using mass spectrometry

18 Forty-four Lactobacillus isolates were adjusted to 4.18 x 106 CFU/ml in MRS and incubated for 24

19 hours under anaerobic conditions. Following incubation, the cultures were centrifuged and the pellets

20 washed 3x with PBS. Protein was extracted by resuspending the pellets in 100 mM triethylammonium

21 bicarbonate (TEAB; Sigma-Aldrich, MO, USA) 4% sodium dodecyl sulfate (SDS; Sigma-Aldrich, MO,

22 USA). Samples were sonicated in a sonicating water bath and subsequently incubated at 95°C for

23 10min. Nucleic acids were degraded using benzonase nuclease (Sigma-Aldrich, MO, USA) and

24 samples were clarified by centrifugation at 10 000xg for 10 min. Quantification was performed using

25 the Quanti-Pro BCA assay kit (Sigma-Aldrich, MO, USA). HILIC beads (ReSyn Biosciences, South

26 Africa) were washed with 250 μl wash buffer [15% ACN, 100 mM Ammonium acetate (Sigma-Aldrich,

27 MO, USA) pH 4.5]. The beads were then resuspended in loading buffer (30% ACN, 200mM

28 Ammonium acetate pH 4.5). A total of 50 μg of protein from each sample was transferred to a protein

29 LoBind plate (Merck, NJ, USA). Protein was reduced with tris (2-carboxyethyl) phosphine (Sigma-

30 Aldrich, MO, USA) and alkylated with methylmethanethiosulphonate (MMTS; Sigma-Aldrich, MO,

23 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.10.986646; this version posted September 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 USA). HILIC magnetic beads were added at an equal volume to that of the sample and a ratio of 5:1

2 total protein and incubated on the shaker at 900 rpm for 30 min. After binding, the beads were

3 washed four times with 95% ACN. Protein was digested by incubation with trypsin for four hours and

4 the supernatant containing peptides was removed and dried down in a vacuum centrifuge. LC-MS/MS

5 analysis was conducted with a Q-Exactive quadrupole-Orbitrap MS as described above. Raw files

6 were processed with MaxQuant version 1.5.7.4 against a database including the Lactobacillus genus

7 and common contaminants. Log10-transformed iBAQ intensities were compared using Mann-Whitney

8 U test and p-values were adjusted for multiple comparisons using an FDR step-down procedure.

9

10 FGT-METAP application

11 We have developed FGT-METAP (Female Genital Tract Metaproteomics), an online application for

12 exploring and mining the FGT microbial composition and function of South African women at high risk

13 of HIV infection. The app is available at (http://immunodb.org/FGTMetap/) and allows users to repeat

14 the analyses presented here, as well as perform additional analyses and investigation using the

15 metaproteomic data generated in this study.

16

24 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.10.986646; this version posted September 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Abbreviations

2 BP: biological processes

3 BV: Bacterial vaginosis

4 CC: cellular component

5 FDR: false discovery rate

6 FGT: Female genital tract

7 GO: gene ontology

8 iBAQ: intensity-based absolute quantification

9 LC-MS/MS: liquid chromatography-tandem mass spectrometry

10 MF: molecular function

11 OOB: out-of-bag

12 PCA: principal component analysis

13 PSA: prostate specific antigen

14 RF: random forest

15 STIs: sexually transmitted infections

16 WGCNA: Weighted correlation network analysis

17

25 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.10.986646; this version posted September 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Declarations

2 Ethics approval and consent to participate

3 The University of Cape Town Faculty of Health Sciences Human Research Ethics Committee (HREC

4 REF: 267/2013) approved this study. Women ≥18 years provided written informed consent, those <18

5 years provided informed assent and informed consent was obtained from parents/guardians.

6

7 Consent for publication

8 Not applicable.

9

10 Availability of data and materials

11 The dataset supporting the conclusions of this article is available in the FGT-METAP application,

12 http://immunodb.org/FGTMetap/. Data are available from the corresponding author upon reasonable

13 request. The protein accession IDs provided at http://immunodb.org/FGTMetap/ reflect UniProt IDs of

14 each protein and additional information for each protein is provided based on UniProt and NCBI

15 databases.

16

17 Competing interests

18 The authors declare that they have no competing interests.

19

20 Funding

21 The laboratory work, data analysis and writing were supported by the Carnegie Corporation of New

22 York, South African National Research Foundation (NRF), the Poliomyelitis Research Foundation,

23 and the South African Medical Research Council (PI: L. Masson). The WISH cohort was supported by

24 the European and Developing Countries Clinical Trials Partnership Strategic Primer grant

25 (SP.2011.41304.038; PI J.S. Passmore) and the South African Department of Science and

26 Technology (DST/CON 0260/2012; PI J.S. Passmore). J.M.B. thanks the NRF for a SARChI Chair.

27 D.L.T. was supported by the South African Tuberculosis Bioinformatics Initiative (SATBBI), a Strategic

28 Health Innovation Partnership grant from the South African Medical Research Council and South

29 African Department of Science and Technology.

30

26 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.10.986646; this version posted September 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Authors' contributions

2 AA analyzed the data, developed the FGT-METAP application, and wrote the manuscript; MTM, CB,

3 SD, and SB conducted some of the laboratory experiments, analyzed the data and contributed to

4 manuscript preparation; MP, AI, NR, IA, NM, BC and DLT analyzed the data and contributed to

5 manuscript preparation; LB, EW, MP, ZM, HG conducted some of the laboratory experiments and

6 contributed to manuscript preparation; NM and JMB supervised the data analysis and contributed to

7 manuscript preparation; AG supervised the development of FGT-METAP and hosting the application

8 and contributed to manuscript preparation; LGB managed the clinical site for the WISH study,

9 collected some of the clinical data and contributed to manuscript preparation; HJ supervised the data

10 collection, analyzed the data and contributed to manuscript preparation; JSP was Principal

11 Investigator of the WISH cohort, supervised the collection of some of the data and contributed to

12 manuscript preparation; LM conceptualized the study, supervised the data collection, analyzed some

13 of the data and wrote the manuscript.

14

15 Acknowledgements

16 We would like to acknowledge the participants of the Women’s Initiative in Health Study.

17

18

19

20

27 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.10.986646; this version posted September 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Figure legends

2

3 Figure 1 Bacterial relative abundance determined using metaproteomics versus 16S rRNA

4 gene sequencing by inflammation cytokine profile. Liquid chromatography-tandem mass

5 spectrometry was used to evaluate the metaproteome in lateral vaginal wall swabs from 113 women

6 from Cape Town, South Africa. Proteins were identified using MaxQuant and a custom database

7 generated using de novo sequencing to filter the UniProt database. was assigned using

8 UniProt and relative abundance of each taxon was determined by aggregating the intensity-based

9 absolute quantification (iBAQ) values of all proteins identified for each taxon. (a) Proteins identified

10 were assigned to the Eukaryota and Bacteria domains. Eukaryota proteins included those assigned to

11 the fungi kingdom and metazoan subkingdom, while the bacteria domain included actinobacteria,

12 , fusobacteria, bacteriodetes and gammaproteobacter phyla. (b) Distribution of taxa

13 identified is shown as a pie chart. (c) Number of proteins detected for taxa for which the greatest

14 number of proteins were identified. (d) Protein relative abundance per taxon for taxa with the highest

15 relative abundance. The relative abundance of the top 20 most abundant bacterial taxa identified

16 using (e) 16S rRNA gene sequencing and (f) metaproteomics is shown for all participants for whom

17 both 16S rRNA gene sequence data and metaproteomics data were generated (n=74). For 16S rRNA

18 gene sequence analysis, the V4 region was amplified and libraries sequenced on an Illumina MiSeq

19 platform. Inflammation groups were defined based on hierarchical followed by K-means clustering of

20 all women according to the concentrations of nine pro-inflammatory cytokine concentrations

21 [interleukin (IL)-1α, IL-1β, IL-6, IL-12p40, IL-12p70, tumor necrosis factor (TNF)-α, TNF-β, TNF-

22 related apoptosis-inducing ligand (TRAIL), interferon (IFN)-γ]. OTU: Operational taxonomic unit.

23

24 Figure 2 Weighted co-correlation network analysis of microbial, and human proteins. The

25 weighted correlation network analysis (WGCNA) R package was used to build a microbial-host

26 functional weighted co-correlation network using intensity-based absolute quantification (iBAQ)

27 values. a) Correlations between proteins and modules are shown by the heatmap, with positive

28 correlations shown in red and negative correlations shown in blue. Row sidebars represent the top

29 taxa and biological processes assigned to each of the proteins (no separate correlation coefficients

30 were calculated for taxa and biological processes). b) The protein dendrogram and module

28 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.10.986646; this version posted September 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 assignment are displayed, with five modules identified using dynamic tree cut. c) Spearman’s rank

2 correlation was used to determine correlation coefficients between individual cytokines/chemokines

3 and module eigengenes (the first principal component of the expression matrix of the corresponding

4 module) for all samples. Finally, we reported the average Spearman’s correlation coefficients for all

5 pro-inflammatory cytokines and chemokines. d) Similarly, Spearman’s correlation coefficients were

6 calculated between module eigengenes and Nugent scores [bacterial vaginosis (BV)] and sexually

7 transmitted infections (STIs) as a categorical variable.

8

9 Figure 3 Comparison of bacterial, bacterial protein and bacterial function relative abundance

10 for prediction of genital inflammation. Random forest analysis was used to evaluate the accuracy

11 of (a) bacterial relative abundance (determined using 16S rRNA gene sequencing; n=74), (b)

12 bacterial functional predictions based on 16 rRNA data (n=74), (c) bacterial protein relative

13 abundance (determined using metaproteomics; n=74) and (d) bacterial protein molecular function

14 relative abundance (determined by metaproteomics and aggregation of protein values assigned to the

15 same gene ontology term; n=74) for determining the presence of genital inflammation (low, medium

16 and high groups). Inflammation groups were defined based on hierarchical followed by K-means

17 clustering of women according to the concentrations of nine pro-inflammatory cytokine concentrations

18 [interleukin (IL)-1α, IL-1β, IL-6, IL-12p40, IL-12p70, tumor necrosis factor (TNF)-α, TNF-β, TNF-

19 related apoptosis-inducing ligand (TRAIL), interferon (IFN)-γ]. The bars and numbers within the bars

20 indicate the relative importance of each taxon, protein or function based on the Mean Decrease in

21 Gini Value. The sizes of bars in each panel differ based on the length of the labels. (e) Each random

22 forest model was iterated 100 times for each of the input datasets separately and the distribution of

23 the out-of-bag error rates for the 100 models were then compared using t-tests. OTU: Operational

24 taxonomic unit.

25

26 Figure 4 Microbial biological process and cellular component gene ontologies associated with

27 genital inflammatory cytokine profiles. Unsupervised hierarchical clustering of aggregated

28 intensity-based absolute quantification (iBAQ) values for microbial protein (a) biological process (BP)

29 or (b) cellular component (CC) gene ontology (GO) relative abundance (n=113). GO relative

30 abundance was determined by metaproteomics and aggregation of microbial protein iBAQ values

29 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.10.986646; this version posted September 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 assigned to the same GO term. The heatmaps show aggregated microbial GO relative abundance

2 and the bar graphs show fold changes in aggregated log2-transformed iBAQ values (LOGFC) for each

3 microbial protein with the same (a) BP or (b) CC GO in women with high versus low inflammation.

4 Inflammation groups were defined based on hierarchical followed by K-means clustering of women

5 according to the concentrations of nine pro-inflammatory cytokine concentrations [interleukin (IL)-1α,

6 IL-1β, IL-6, IL-12p40, IL-12p70, tumor necrosis factor (TNF)-α, TNF-β, TNF-related apoptosis-

7 inducing ligand (TRAIL), interferon (IFN)-γ]. Red bars indicate positive and blue bars indicate negative

8 fold changes in women with high versus low inflammation. False discovery rate adjusted p-values are

9 shown by dots, with red dots indicating low p-values. Red arrows indicate cell wall and membrane

10 processes and components. BV: Bacterial vaginosis. Proinflam cyt: Pro-inflammatory cytokine

11

12 Figure 5 Relative abundance of gene ontologies in independent samples and in inflammatory

13 versus non-inflammatory Lactobacillus isolates. (a) The top 14 microbial biological process (BP)

14 and cellular component (CC) gene ontology (GO) terms that distinguished women with low versus

15 high inflammation in the full cohort (n=113) were validated in an independent sample of ten women

16 from Cape Town, South Africa. Liquid chromatography-tandem mass spectrometry was used to

17 evaluate the metaproteome in lateral vaginal wall swabs from these women. Inflammation groups

18 were defined based on hierarchical followed by K-means clustering of these ten women according to

19 the concentrations of nine pro-inflammatory cytokine concentrations [interleukin (IL)-1α, IL-1β, IL-6,

20 IL-12p40, IL-12p70, tumor necrosis factor (TNF)-α, TNF-β, TNF-related apoptosis-inducing ligand

21 (TRAIL), interferon (IFN)-γ]. GO relative abundance was determined by aggregation of microbial

22 protein intensity-based absolute quantification (iBAQ) values assigned to a same GO term. The

23 relative abundance of the top BPs and CCs (except ATP binding cassette transporter complex which

24 was not detected in these samples) are shown as bar graphs, with blue indicating women with low

25 inflammation (n=5) and red indicating women with high inflammation (n=5). Welch’s t-test was used

26 for comparisons. *P<0.05. (b-i) Twenty-two Lactobacillus isolates that induced relatively high

27 inflammatory responses and 22 isolates that induced lower inflammatory responses were adjusted to

28 4.18 x 106 CFU/ml in bacterial culture medium and incubated for 24 hours under anaerobic

29 conditions. Following incubation, protein was extracted, digested and liquid chromatography-tandem

30 mass spectrometry was conducted. Raw files were processed with MaxQuant against a database

30 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.10.986646; this version posted September 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 including the Lactobacillus genus and common contaminants. The iBAQ values for proteins with the

2 same gene ontologies were aggregated, log10-transformed and compared using Mann-Whitney U test.

3 Box-and-whisker plots show log10-transformed iBAQ values, with lines indicating medians and

4 whiskers extending to 1.5 times the interquartile range from the box. A false discovery rate step-down

5 procedure was used to adjust for multiple comparisons and adjusted p-values <0.05 were considered

6 statistically significant.

7

8 Figure 6 Longitudinal changes in FGT metaproteomic profiles. Liquid chromatography-tandem

9 mass spectrometry was used to evaluate the metaproteome in lateral vaginal wall swabs from 74

10 women from Cape Town, South Africa at two visits nine weeks (interquartile range: 9-11 weeks)

11 apart. (a) Sankey diagram was used to visualize changes in inflammatory cytokine profiles between

12 visits. (b) Principal component analysis (mixOmics R package) was used to group women based on

13 the log2-transformed intensity-based absolute quantification (iBAQ) values of all proteins identified at

14 both visits. Grouping is based on vaginal pro-inflammatory cytokine concentrations and each point

15 represents an individual woman. (c) Top 20 proteins (UniProt IDs) distinguishing women with and

16 without inflammation at both visits determined by moderated t-test (limma R package) and random

17 forest analysis (randomForest R package). (d) Top 10 taxa distinguishing women with and without

18 inflammation at both visits determined by moderated t-test (limma R package) and random forest

19 algorithm (randomForest R package). (e) Top 14 biological process and cellular component gene

20 ontology terms distinguishing women with and without inflammation at both visits determined by

21 moderated t-test (limma R package) and random forest algorithm (randomForest R package).

22 Positions of participants in each heatmap are fixed and the inflammation status of each participant

23 across the visits can be tracked using the row sidebars. Inflammation groups were defined based on

24 hierarchical followed by K-means clustering of nine pro-inflammatory cytokine concentrations

25 [interleukin (IL)-1α, IL-1β, IL-6, IL-12p40, IL-12p70, tumor necrosis factor (TNF)-α, TNF-β, TNF-

26 related apoptosis-inducing ligand (TRAIL), interferon (IFN)-γ]. ABC: ATP-binding cassette. Agg:

27 aggregated. BP: biological process. CC: cellular component. Infla: inflammation. PC: principal

28 component. PEP-PTS: phosphoenolpyruvate-dependent phosphotransferase system. RNDP

29 complex: ribonucleoside-diphosphate reductase complex.

30

31 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.10.986646; this version posted September 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1

2

3 Supplementary table titles and legends

4 Additional file 1: Table S1 Demographic and clinical characteristics of study participants stratified by

5 female genital tract pro-inflammatory cytokine levels

6

7 Additional file 2: Table S2 List of protein BLAST results against UniProt and NCBI nr databases

8 (DBs). Liquid chromatography-tandem mass spectrometry was used to evaluate the metaproteome in

9 lateral vaginal wall swabs from 113 women from Cape Town, South Africa. Primary assigned taxa,

10 obtained based on MaxQuant version 1.5.7.4, are presented in the second column. The curated taxa

11 (third column) were obtained after considering the top similar and frequent BLAST hits for each

12 protein. IDs: UniProt Identity codes

13

14 Additional file 2: Table S3 List of differentially abundant taxa distinguishing women with low and

15 high inflammation. Liquid chromatography-tandem mass spectrometry was used to evaluate the

16 metaproteome in lateral vaginal wall swabs from 113 women from Cape Town, South Africa. The

17 false discovery rate (FDR) corrected p-values were obtained based on the aggregated log2-

18 transformed intensity-based absolute quantification of all proteins assigned to the same taxa applying

19 the moderated t-test (limma R package), after adjusting for confounding variables including age,

20 contraceptives, prostate specific antigen (PSA) and sexually transmitted infections (STIs). Taxa with

21 FDR adjusted p-values <0.05 and log2-transformed fold change > 1.2 or < -1.2 were considered as

22 differentially abundant taxa.

23

24 Additional file 2: Table S4 List of differentially abundant proteins distinguishing women with low,

25 medium and high inflammation. Liquid chromatography-tandem mass spectrometry was used to

26 evaluate the metaproteome in lateral vaginal wall swabs from 113 women from Cape Town, South

27 Africa. The false discovery rate (FDR) corrected p-values were obtained based on the log2-

28 transformed intensity-based absolute quantification of each protein applying the moderated t-test

29 (limma R package), after adjusting for confounding variables including age, contraceptives, prostate

30 specific antigen (PSA) and sexually transmitted infections (STIs). Proteins with FDR adjusted p-

32 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.10.986646; this version posted September 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 values <0.05 and log2-transformed fold change > 1.2 or < -1.2 were considered as differentially

2 abundant proteins. IDs: UniProt Identity codes

3

4 Additional file 2: Table S5 List of differentially abundant molecular function (MF) gene ontology

5 terms distinguishing women with low and high inflammation. Liquid chromatography-tandem mass

6 spectrometry was used to evaluate the metaproteome in lateral vaginal wall swabs from 113 women

7 from Cape Town, South Africa. The false discovery rate (FDR) corrected p-values were obtained

8 based on the aggregated log2-transformed intensity-based absolute quantification of all proteins

9 assigned to the same MF term applying the moderated t-test (limma R package), after adjusting for

10 confounding variables including age, contraceptives, prostate specific antigen (PSA) and sexually

11 transmitted infections (STIs). Taxa with FDR adjusted p-values <0.05 and log2-transformed fold

12 change > 1.2 or < -1.2 were considered as significantly abundant MFs.

13

14 Additional file 1: Table S6 Details of MaxQuant version 1.5.7.4 parameters used for analysis of

15 metaproteomic data obtained from lateral vaginal wall swabs from 113 women from Cape Town,

16 South Africa.

17

33 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.10.986646; this version posted September 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 References

2 1. UNAIDS Data 2019. Available at: https://www.unaids.org/sites/default/files/media_asset/2019-

3 UNAIDS-data_en.pdf. Accessed 20/02/20.

4 2. Masson L, Passmore JAS, Liebenberg LJ, Werner L, Baxter C, Arnold KB, et al. Genital

5 Inflammation and the Risk of HIV Acquisition in Women. Clin Infect Dis. 2015;61:260-269.

6 3. McKinnon LR, Liebenberg LJ, Yende-Zuma N, Archary D, Ngcapu S, Sivro A, et al. Genital

7 inflammation undermines the effectiveness of tenofovir gel in preventing HIV acquisition in women.

8 Nat Med. 2018;24:491.

9 4. Morrison C, Fichorova RN, Mauck C, Chen PL, Kwok C, Chipato T, et al. Cervical inflammation and

10 immunity associated with hormonal contraception, pregnancy, and HIV-1 seroconversion. J Acquir

11 Immune Defic Syndr. 2014;66:109-117.

12 5. Li Q, Estes JD, Schlievert PM, Duan L, Brosnahan AJ, Southern PJ, et al. Glycerol monolaurate

13 prevents mucosal SIV transmission. Nature. 2009;458:1034-1038.

14 6. Osborn L, Kunkel S, Nabel GJ. Tumor necrosis factor α and interleukin 1 stimulate the human

15 immunodeficiency virus enhancer by activation of the nuclear factor κB. Proc Natl Acad Sci U S A.

16 1989;86:2336-2340.

17 7. Arnold KB, Burgener A, Birse K, Romas L, Dunphy LJ, Shahabi K, et al. Increased levels of

18 inflammatory cytokines in the female reproductive tract are associated with altered expression of

19 proteases, mucosal barrier proteins, and an influx of HIV-susceptible target cells. Mucosal Immunol.

20 2016;9:194-205.

21 8. Burgener A, Rahman S, Ahmad R, Lajoie J, Ramdahin S, Mesa C, et al. Comprehensive proteomic

22 study identifies serpin and cystatin antiproteases as novel correlates of HIV-1 resistance in the

23 cervicovaginal mucosa of female Sex workers. J Proteome Res. 2011;11:5139-5149.

24 9. Masson L, Mlisana K, Little F, Werner L, Mkhize NN, Ronacher K, et al. Defining genital tract

25 cytokine signatures of sexually transmitted infections and bacterial vaginosis in women at high risk of

26 HIV infection: A cross-sectional study. Sex Transm Infect. 2014;90:580-587.

27 10. Lennard K, Dabee S, Barnabas SL, Havyarimana E, Blakney A, Jaumdally SZ, et al. Microbial

28 composition predicts genital tract inflammation and persistent bacterial vaginosis in South African

29 adolescent females. Infect Immun. 2018;86:e00410-17.

34 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.10.986646; this version posted September 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 11. Anahtar MN, Byrne EH, Doherty KE, Bowman BA, Yamamoto HS, Soumillon M, et al.

2 Cervicovaginal bacteria are a major modulator of host inflammatory responses in the female genital

3 tract. Immunity. 2015;42: 965-976.

4 12. Deese J, Masson L, Miller W, Cohen M, Morrison C, Wang M, et al. Injectable progestin-only

5 contraception is associated with increased levels of pro-inflammatory cytokines in the female genital

6 tract. Am J Reprod Immunol. 2015;74:357-367.

7 13. Gosmann C, Anahtar MN, Handley SA, Farcasanu M, Abu-Ali G, Bowman BA, et al.

8 Lactobacillus-deficient cervicovaginal bacterial communities are associated with increased HIV

9 acquisition in young South African women. Immunity. 2017;46:29-37.

10 14. Chapot-Chartier MP, Kulakauskas S. Cell wall structure and function in lactic acid bacteria. Microb

11 Cell Fact. 2014;13:S9.

12 15. Chetwin E, Manhanzva MT, Abrahams AG, Froissart R, Gamieldien H, Jaspan H, et al.

13 Antimicrobial and inflammatory properties of South African clinical Lactobacillus isolates and vaginal

14 probiotics. Sci Rep. 2019; 9:1-15.

15 16. Hearps AC, Tyssen D, Srbinovski D, Bayigga L, Diaz DJD, Aldunate M, et al. Vaginal lactic acid

16 elicits an anti-inflammatory response from human cervicovaginal epithelial cells and inhibits

17 production of pro-inflammatory mediators associated with HIV acquisition. Mucosal Immunol.

18 2017;10:1480-1490.

19 17. McKinnon LR, Achilles SL, Bradshaw CS, Burgener A, Crucitti T, Fredricks DN, et al. The

20 Evolving Facets of Bacterial Vaginosis: Implications for HIV Transmission. AIDS Res. Hum.

21 Retroviruses. 2019.

22 18. Ravel J, Gajer P, Abdo Z, Schneider GM, Koenig SSK, McCulle SL, et al. Vaginal microbiome of

23 reproductive-age women. Proc Natl Acad Sci U S A. 2011;35:219-228.

24 19. Xiong W, Abraham PE, Li Z, Pan C, Hettich RL. Microbial metaproteomics for characterizing the

25 range of metabolic functions and activities of human gut microbiota. Proteomics. 2015;15:3424-38.

26 20. Klatt NR, Cheu R, Birse K, Zevin AS, Perner M, Noël-Romas L, et al. Vaginal bacteria modify HIV

27 tenofovir microbicide efficacy in African women. Science. 2017;356:938-945.

28 21. Zevin AS, Xie IY, Birse K, Arnold K, Romas L, Westmacott G, et al. Microbiome Composition and

29 Function Drives Wound-Healing Impairment in the Female Genital Tract. PLoS Pathog. 2016;12:

30 e1005889.

35 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.10.986646; this version posted September 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 22. Barnabas SL, Dabee S, Passmore JAS, Jaspan HB, Lewis DA, Jaumdally SZ, et al. Converging

2 epidemics of sexually transmitted infections and bacterial vaginosis in southern African female

3 adolescents at risk of HIV. Int J STD AIDS. 2018;29:531-539.

4 23. Dabee S, Barnabas SL, Lennard KS, Jaumdally SZ, Gamieldien H, Balle C, et al. Defining

5 characteristics of genital health in South African adolescent girls and young women at high risk for

6 HIV infection. PLoS One. 2019;14:e0213975.

7 24. Fernandez EM, Valenti V, Rockel C, Hermann C, Pot B, Boneca IG, et al. Anti-inflammatory

8 capacity of selected lactobacilli in experimental colitis is driven by NOD2-mediated recognition of a

9 specific peptidoglycan-derived muropeptide. Gut. 2011;60:1050-1059.

10 25. Manhanzva MT, Abrahams AG, Gamieldien H, Froissart R, Jaspan H, Jaumdally SZ, et al.

11 Inflammatory and antimicrobial properties differ between vaginal Lactobacillus isolates from South

12 African women with non-optimal versus optimal microbiota. Sci Rep. 2020;10:1-13.

13 26. Macho Fernandez E, Pot B, Grangette C. Beneficial effect of probiotics in IBD: Are peptidogycan

14 and NOD2 the molecular key effectors? Gut Microbes. 2011;2:280-6.

15 27. Balle C, Lennard K, Dabee S, Barnabas SL, Jaumdally SZ, Gasper MA, et al. Endocervical and

16 vaginal microbiota in South African adolescents with asymptomatic Chlamydia trachomatis infection.

17 Sci Rep. 2018;8:1-9.

18 28. Lagier JC, Dubourg G, Million M, Cadoret F, Bilen M, Fenollar F, et al. Culturing the human

19 microbiota and culturomics. Nat. Rev. Microbiol. 2018;16:540-550.

20 29. Holm JB, France M, Ma B, Robinson CK, McComb E, Brotman RM, et al. Comparative

21 metagenome-assembled genome analysis of Lachnovaginosum genomospecies, formerly known as

22 BVAB1. bioRxiv. 2019;657197.

23 30. Huffnagle GB, Noverr MC. The emerging world of the fungal microbiome. Trends Microbiol. 2013.

24 31. Bianco L, Perrotta G. Methodologies and perspectives of proteomics applied to filamentous fungi:

25 From sample preparation to secretome analysis. Int. J. Mol. Sci. 2015;21:334-41.

26 32. Verstraelen H, Verhelst R, Claeys G, De Backer E, Temmerman M, Vaneechoutte M. Longitudinal

27 analysis of the vaginal microflora in pregnancy suggests that L. crispatus promotes the stability of the

28 normal vaginal microflora and that L. gasseri and/or L. iners are more conducive to the occurrence of

29 abnormal vaginal microflora. BMC Microbiol. 2009;9:116.

36 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.10.986646; this version posted September 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 33. van Houdt R, Ma B, Bruisten SM, Ravel AGCLSJ, Vries HJC de. Lactobacillus iners-dominated

2 vaginal microbiota is associated with increased susceptibility to Chlamydia trachomatis infection in

3 Dutch women: a case–control study. Sex Transm Infect. 2018;94(2):117-23.

4 34. Potgieter MG, Nel AJM, Tabb DL, Fortuin S, Garnett S, Blackburn JM, et al.

5 MetaNovo: a probabilistic approach to peptide and polymorphism discovery in complex

6 mass spectrometry datasets. bioRxiv. 2019;605550.

7 35. Afiuni-Zadeh S, Boylan KLM, Jagtap PD, Griffin TJ, Rudney JD, Peterson ML, et al. Evaluating

8 the potential of residual Pap test fluid as a resource for the metaproteomic analysis of the cervical-

9 vaginal microbiome. Sci Rep. 2018;8:10868.

10 36. Smyth GK. limma: Linear Models for Microarray Data. Bioinforma Comput Biol Solut Using R

11 Bioconductor. 2005.

12 37. Rohart F, Gautier B, Singh A, Lê Cao KA. mixOmics: An R package for ‘omics feature selection

13 and multiple data integration. PLoS Comput Biol. 2017;13:e1005752.

14 38. Gu Z, Eils R, Schlesner M. Complex heatmaps reveal patterns and correlations in

15 multidimensional genomic data. Bioinformatics. 2016;32:2847-9.

16 39. Langfelder P, Horvath S. WGCNA: An R package for weighted correlation network analysis. BMC

17 Bioinformatics. 2008;9:559.

18 40. Liaw A, Wiener M. Classification and Regression by randomForest. R News. 2002;2:18-22.

19 41. Pearce MM, Hilt EE, Rosenfeld AB, Zilliox MJ, Thomas-White K, Fok C, et al. The female urinary

20 microbiome: A comparison of women with and without urgency urinary incontinence. MBio. 2014;5,

21 e01283-14.

22 42. Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA, Holmes SP. DADA2: High-

23 resolution sample inference from Illumina amplicon data. Nat Methods. 2016;13:581.

24 43. Narayan NR, Weinmaier T, Laserna-Mendieta EJ, Claesson MJ, Shanahan F, Dabbagh K, et al.

25 Piphillin predicts metagenomic composition and dynamics from DADA2-corrected 16S rDNA

26 sequences. BMC Genomics. 2020;21:1-12.

27

37 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.10.986646; this version posted September 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

c 1600 1422 a 1400 1200 proteins 1000 of 800 600 307 291 400 255 214 104 103 200 82 63 61 56 52 52 29 27 26 21 21 0 Total number Others

Homo sapiens Pseudomonas Prevotella amnii Sneathia amnii Clostridium sp. Megasphaera Sp. LactobacillusMobiluncus iners mulieris Atopobium vaginae Gardnerella vaginalis Lactobacillus jensenii Prevotella sp.Prevotella S7-1-8 timonensisPrevotella sp. BV3P1 Megasphaera sp. UPII Fusobacterium nucleatum

d 100.0 78.2 80.0

proteins (%) 60.0 r 40.0

b 20.0 7.0 3.8 3.6 1.4 1.3 0.7 0.7 0.4 0.3 0.3 0.3 0.2 0.2 0.1 0.1 1.5 Fungi 0.0 Human

1% of Abundance Bacteria 44% 55% Others

Homo sapiens Prevotella amnii Sneathia amnii Escherichia coli Lactobacillus iners Megasphaera sp. AtopobiumMobiluncus vaginae mulieris Gardnerella vaginalisLactobacillus jensenii PrevotellaPrevotella sp. BV3P1 sp. S7-1-8 Lactobacillus crispatus Prevotella timonensis Megasphaera genomosp uncultured Clostridium sp.

100% e LOW MEDIUM HIGH Other 100% Mageeibacillus indolicus 90% 100% OtherDialister 90% MageeibacillusAtopobium indolicusvaginae Dialister AtopobiumSneathia vaginae sanguinegens 80% 80% 80% SneathiaSneathia sanguinegens amnii/sanguinegens SneathiaMegasphaera amnii/sanguinegens 70% MegasphaeraPrevotella disiens PrevotellaPrevotella disiens timonensis 70% 60%60% PrevotellaPrevotella timonensis OTU336 PrevotellaPrevotella OTU336 OTU9 50% Prevotella OTU9 PrevotellaPrevotella amnii amnii OTU16 OTU16 60% Prevotella amnii OTU45 40%40% Prevotella amnii OTU45

Relativeabundance GardnerellaGardnerella vaginalis vaginalis OTU21 OTU21 Gardnerella vaginalis OTU7 30% Gardnerella vaginalis OTU7 50% GardnerellaGardnerella vaginalis vaginalis OTU2 OTU2 BVAB1 20%20% Lachnovaginosum genomospecies LactobacillusLactobacillus OTU12 OTU12 Lactobacillus OTU29 Relative abundance Relative 10% Lactobacillus OTU29 40% LactobacillusLactobacillus crispatus crispatus Lactobacillus iners

Relativeabundance 0% 0% Lactobacillus iners

30%100% LOW MEDIUM HIGH Other f 100% Fusarium verticillioides 100% Other 20%90% FusariumEscherichia verticillioides coli 90% Escherichia coli Pseudomonas fluorescens fluorescens group group Pseudomonas 80% Mobiluncus mulieris 10%80% 80% AtopobiumMobiluncus vaginae mulieris SneathiaAtopobium amnii vaginae 70% Megasphaera sp. UPII 135-E MegasphaeraSneathia amnii genomosp. type_1 Prevotella timonensis PrevotellaMegasphaera sp. BV3P1 sp. UPII 135-E 70% 60% 0% 60% Prevotella sp. S7-1-8 PrevotellaMegasphaera amnii genomosp. type_1 GardnerellaPrevotella vaginalis timonensis 50% unculturedPrevotella Ruminococcus sp. BV3P1 sp. 60% uncultured Clostridium sp. LactobacillusPrevotella johnsoniisp. S7-1-8 40%40% Lactobacillus jensenii

Relativeabundance LactobacillusPrevotella crispatusamnii Lactobacillus iners 30% Gardnerella vaginalis 50% uncultured Ruminococcus sp. 20%20% uncultured Clostridium sp. Lactobacillus johnsonii

40% abundance Relative 10% Lactobacillus jensenii

Relativeabundance 0% Lactobacillus crispatus 0% Lactobacillus iners 30%

20%

10%

0% grey brown turquoise blue yellow

Modules

Species

BP grey Correlation Coeficients brown Modules − − 0 0.5 1 yellow turquoise grey brown blue

turquoise 1 0.5 blue yellow a

Modules grey Modules brown grey Speciesbrown turquoise Species

turquoise bioRxiv preprint Megasphaera genomosp. type_1 Lactobacillus kefiranofaciens Lactobacillus johnsonii Lactobacillus jensenii Lactobacillus iners Lactobacillus helveticus Lactobacillus hamsteri Lactobacillus delbrueckii Lactobacillus crispatus Homo sapiens Gardnerella vaginalis Atopobium vaginae blue Prevotella timonensis Prevotella amnii blue yellow BP Correlation Coeficients yellow Modules − − 0 0.5 1 yellow turquoise grey brown blue 1 0.5 Modules Modules (which wasnotcertifiedbypeerreview)istheauthor/funder.Allrightsreserved.Noreuseallowedwithoutpermission. doi: Species Species https://doi.org/10.1101/2020.03.10.986646

BP BP Modules Correlation Coeficients Taxa Species Correlation Coeficients Modules BP Biological L_gluconeogenesis K_cell wall organization/regulationofcellshape K_cell wall organization J_cell redox homeostasis I_ATP synthesiscoupled protontransport H_wound healing G_cellular responsetohypoxia F_carbohydrate metabolicprocess F_carbohydrate derivative biosyntheticprocess E_regulation ofcelladhesion E_positive regulationofsubstrate adhesion E_cell E_cell E_cell adhesion E_adherens junctionorganization response D_negative regulationofinflammatory D_negative regulationofimmune response C_neutrophil degranulation B_antimicrobial humoral response B_antimicrobial humoral immune response response A_regulation ofacuteinflammatory A_positive regulationofinterleukin A_positive regulationofhumoral immune response A_innate immune response response/innate immune response A_inflammatory response A_inflammatory A_immune systemprocess response A_immune response/inflamatory A_immune response response toantigenicstimulus A_chronic inflammatory A_cellular responsetointerleukin A_cellular responsetointerferon A_adaptive immune response/neutrophil degranulation A_acute S_NA R_biosynthetic process Q_response tostress P_PEP O_translation O_protein folding N_metabolic process M_lipoteichoic acidbiosyntheticprocess L_glycolytic process L_glucose metabolicprocess Prevotella timonensis Prevotella amnii Megasphaera genomosp. type_1 Lactobacillus kefiranofaciens Lactobacillus johnsonii Lactobacillus jensenii Lactobacillus iners Lactobacillus helveticus Lactobacillus hamsteri Lactobacillus delbrueckii Lactobacillus crispatus Homo sapiens Gardnerella vaginalis Atopobium vaginae yellow turquoise grey brown blue − − 0 0.5 1 − − 0 0.5 1 yellow turquoise grey brown blue 1 0.5 1 0.5 − − − cell junctionassembly cell adhesion − PTS phase response/responsetocytokine process Species Species Prevotella timonensis Prevotella amnii Megasphaera genomosp. type_1 Lactobacillus kefiranofaciens Lactobacillus johnsonii Lactobacillus jensenii Lactobacillus iners Lactobacillus helveticus Lactobacillus hamsteri Lactobacillus delbrueckii Lactobacillus crispatus Homo sapiens Gardnerella vaginalis Atopobium vaginae Modules − ; − Prevotella timonensis Prevotella amnii Megasphaera genomosp. type_1 Lactobacillus kefiranofaciens Lactobacillus johnsonii Lactobacillus jensenii Lactobacillus iners Lactobacillus helveticus Lactobacillus hamsteri Lactobacillus delbrueckii Lactobacillus crispatus Homo sapiens Gardnerella vaginalis Atopobium vaginae gamma this versionpostedSeptember27,2020. b − 1 6 production BP

Module colors 2 RHeight E_cell adhesion E_adherens junctionorganization response D_negative regulationofinflammatory D_negative regulationofimmune response C_neutrophil degranulation B_antimicrobial humoral response B_antimicrobial humoral immune response response A_regulation ofacuteinflammatory A_positive regulationofinterleukin A_positive regulationofhumoral immune response A_innate immune response response/innateimmune response A_inflammatory response A_inflammatory A_immune systemprocess response A_immune response/inflamatory A_immune response responsetoantigenicstimulus A_chronic inflammatory A_cellular responsetointerleukin A_cellular responsetointerferon A_adaptive immune response/neutrophildegranulation A_acute S_NA R_biosynthetic process Q_response tostress P_PEP O_translation O_protein folding N_metabolic process M_lipoteichoic acidbiosyntheticprocess L_glycolytic process L_glucose metabolicprocess L_gluconeogenesis K_cell wall organization/regulationofcellshape K_cell wall organization J_cell redox homeostasis I_ATP synthesiscoupledprotontransport H_wound healing G_cellular responsetohypoxia F_carbohydrate metabolicprocess F_carbohydrate derivative biosyntheticprocess E_regulation ofcelladhesion E_positive regulationofsubstrate adhesion E_cell E_cell MEturquoise

0.5 0.6 0.7 0.8 0.9 1.0 MEturquoise MEyellow MEbrown − − MEyellow MEbrown MEgrey − MEblue Proinflammatory cytokinescell junctionassembly cell adhesion − PTS MEgrey MEblue phase response/responsetocytokine c d

Modules Modules Proinflammatory Module Module (5e (5e (4e (7e (2e (4e (5e (5e fastcluster::hclust (*,"average") cytokines − − 0.45 0.44 0.43 (0.01) − − 0.44 0.52 Cluster Dendrogram (0.2) 0.31 0.19 0.88 BP 0.62 0.77 − − − − −

BV −

BV − − − − gamma 06) 08) 04) 06) 04) as.dist(dissTom) − 1 07) 12) 20) BP E_regulation ofcelladhesion E_positive regulationof substrate adhesion E_cell E_cell E_cell adhesion E_adherens junctionorganization response D_negative regulationofinflammatory D_negative regulationofimmune response C_neutrophil degranulation B_antimicrobial humoral response B_antimicrobial humoral immune response response A_regulation ofacuteinflammatory A_positive regulationofinterleukin A_positive regulationofhumoral immune response A_innate immune response response/innateimmune response A_inflammatory response A_inflammatory A_immune systemprocess response A_immune response/inflamatory A_immune response responsetoantigenicstimulus A_chronic inflammatory A_cellular responsetointerleukin A_cellular responsetointerferon A_adaptive immune response/neutrophildegranulation A_acute S_NA R_biosynthetic process Q_response tostress P_PEP O_translation O_protein folding N_metabolic process M_lipoteichoic acidbiosyntheticprocess L_glycolytic process L_glucose metabolicprocess L_gluconeogenesis K_cell wall organization/regulation ofcellshape K_cell wall organization J_cell redox homeostasis I_ATP synthesiscoupledprotontransport H_wound healing G_cellular responsetohypoxia F_carbohydrate metabolicprocess F_carbohydrate derivative biosyntheticprocess 6 production − − trait relationships C_neutrophil degranulation B_antimicrobial humoral response B_antimicrobial humoral immune response response A_regulation ofacuteinflammatory A_positive regulationofinterleukin A_positive regulationofhumoral immune response A_innate immune response response/innateimmune response A_inflammatory response A_inflammatory A_immune systemprocess response A_immune response/inflamatory A_immune response responsetoantigenicstimulus A_chronic inflammatory A_cellular responsetointerleukin A_cellular responsetointerferon A_adaptive immune response/neutrophildegranulation A_acute S_NA R_biosynthetic process Q_response tostress P_PEP O_translation O_protein folding N_metabolic process M_lipoteichoic acidbiosyntheticprocess L_glycolytic process L_glucose metabolicprocess L_gluconeogenesis K_cell wall organization/regulationofcellshape K_cell wall organization J_cell redox homeostasis I_ATP synthesiscoupledprotontransport H_wound healing G_cellular responsetohypoxia F_carbohydrate metabolic process F_carbohydrate derivative biosyntheticprocess E_regulation ofcelladhesion E_positive regulationofsubstrate adhesion E_cell E_cell E_cell adhesion E_adherens junctionorganization response D_negative regulationofinflammatory D_negative regulationofimmune response The copyrightholderforthispreprint trait relationships Chemokines − − − cell junctionassembly cell adhesion −

PTS Chemokines phase response/responsetocytokine (0.004) (0.001) − − (0.07) (0.04) − − − − cell junctionassembly cell adhesion (0.2) 0.43 0.42 0.21 0.054 0.068 0.046 − 0.33 0.37

STIs PTS (0.2) (0.4) (0.6) (0.6) (0.7) 0.16 STIs 0.099 phase response/responsetocytokine − − 0 0.5 1 − − 0 0.5 1

1 0.5

1 0.5 − −

gamma −

1

6 production − − gamma − 1 6 production bioRxiv preprint doi: https://doi.org/10.1101/2020.03.10.986646; this version posted September 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. a b 0.5 0.39 Prevotella Prevotella 0.5 Sphingolipid metabolism 0.51 0.42 Mycoplasma hominis Mycoplasma hominis 0.51 Biosynthesis of ansamycins 0.52 0.43 Fastidiosipila Fastidiosipila 0.52 Neomycin kanamycin and gentamicin biosynthesis 0.52 0.44 Prevotella(OTU336) Prevotella(OTU336) 0.52 Primary immunodeficiency 0.53 0.45 Megasphaera Megasphaera 0.53 Prodigiosin biosynthesis 0.54 0.46 Dialister micraerophilusDialister micraerophilus 0.54 Chloroalkane and chloroalkene degradation 0.54 0.47 Peptostreptococcus anaerobiusPeptostreptococcus anaerobius 0.54 Tuberculosis 0.62 0.49 Dialister Dialister 0.62 PPAR signaling 0.65 Ureaplasma parvum Ureaplasma parvum 0.65 Epithelial cell signaling 0.52 0.67 Gardnerella(OTU7) Gardnerella(OTU7) 0.67 Alanine aspartate and glutamate metabolism 0.52 0.67 Gemella asaccharolyticaGemella asaccharolytica 0.67 Photosynthesis 0.52 0.67 Lactobacillus Lactobacillus 0.67 GABAergic synapse 0.53 0.75 Corynebacterium Corynebacterium 0.75 Oxidative phosphorylation 0.53 1.04 Lachnovaginosum BVAB1Lachnovaginosum BVAB1 1.04 Sulfur relay system 0.63 1.18 Aerococcus christenseniiAerococcus christensenii 1.18 Secondary bile acid biosynthesis 0.64 1.71 Sneathia amnii Sneathia amnii 1.71 Primary bile acid biosynthesis 0.65 1.89 Gardnerella vaginalis Gardnerella vaginalis 1.89 Carbon metabolism 0.67 2.23 Lactobacillus iners Lactobacillus iners 2.23 Tyrosine metabolism 1.03 2.4 Atopobium vaginae Atopobium vaginae 2.4 Biofilm formation 1.32 2.64 Sneathia sanguinegensSneathia sanguinegens 2.64 Prolactin signaling 1.63 −2 −1 −2 0 −1 10 1 2 2 −1 0 1 c d 30S ribosomal protein S13 0.44 ATP binding 0.47 60 kDa chaperonin 0.46 D−alanine−poly(phosphoribitol) ligase activity 0.47 Formate C−acetyltransferase 0.47 rRNA binding 0.52 Transcription elongation factor GreA 0.47 aminoacyl−tRNA editing activity 0.53 Elongation factor Tu 0.48 adenylate kinase activity 0.54 Threonine−−tRNA ligase 0.49 5S rRNA binding 0.62 Fructose−1,6−bisphosphate aldolase, class II 0.53 GTP binding 0.63 Phosphoglycerate kinase 0.56 ribosome binding 0.63 2,3−bisphosphoglycerate 0.57 phosphoglucosamine mutase activity 0.64 Glutamine synthetase 0.59 translation elongation factor activity 0.64 Triosephosphate isomerase 0.75 structural constituent of ribosome 0.65 Elongation factor Tu(C8PAS6) 0.9 phosphoglycerate kinase activity 0.68 Actin, cytoplasmic 2 0.9 amidase activity 0.7 Phosphocarrier, HPr family 0.91 DNA binding 0.77 ATP synthase subunit alpha 0.92 adenylosuccinate synthase activity 0.84 Pyrophosphate phospho−hydrolase 1.05 hydrolase activity 1.18 Cold−shock DNA−binding domain protein 1.28 glutamate−ammonia ligase activity 1.22 Phosphoglycerate kinase(A0A0E9DXN4) 1.72 GTPase activity 1.51 Glyceraldehyde−3−phosphate dehydrogenase 1.87 beta−phosphoglucomutase activity 2.92 Chaperone protein DnaK 2.05 oxidoreductase activity 3.27 −2 −1 0 1 2 −2 0 2

p=9.4E-25 e p=6.4E-101 p=1.7E-94 p=0.79 50 45 40 35 OOB estimate of error rate (%) OOB estimate of error rate OOB estimate of error rate (%) rate error of estimate OOB OTUOTU FunctionOTU- basedbased function on OTUs ProteinProtein FunctionProtein based-based on function proteins bioRxiv preprint doi: https://doi.org/10.1101/2020.03.10.986646; this version posted September 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 2 1 2 a b ProinflaProinflam cyt ProinflaProinflam cyt BVBV BVBV phosphoenolpyruvate−dependent sugar phosphotransferase system cell wall organization large ribosomal subunit transcription, DNA−templated regulation of cell shape peptidoglycan biosynthetic process small ribosomal subunit cell division nitrogen fixation glutamine biosynthetic process lipoteichoic acid biosynthetic process peptidoglycan−based cell wall nucleoside metabolic process 'de novo' CTP biosynthetic process regulation of DNA−templated transcription, elongation S−layer cell cycle 'de novo' AMP biosynthetic process 'de novo' UMP biosynthetic process ribonucleoside−diphosphate reductase complex 1 2 glutamine metabolic process threonyl−tRNA aminoacylation NAD biosynthetic process regulation of translation glutamyl−tRNA(Gln) amidotransferase complex deoxyribonucleotide biosynthetic process 'de novo' pyrimidine nucleobase biosynthetic process D−gluconate metabolic process cell wall tRNA threonylcarbamoyladenosine modification carbohydrate derivative biosynthetic process protein initiator methionine removal tetrahydrofolate interconversion membrane isoprenoid biosynthetic process Proinfla DNA replication protein dephosphorylation GMP reductase complex Proinfla glycine biosynthetic process from serine 200 nucleotide biosynthetic process High BV selenocysteinyl−tRNA(Sec) biosynthetic process Proinfla 150 Low glycerol−3−phosphate dehydrogenase complex selenocysteine biosynthetic process 200 100 seryl−tRNA aminoacylation BV cell wall organization High tRNA processing 150 Low 50 Positive transcription, DNA−templated glycerol−3−phosphate catabolic process endoplasmic reticulum−Golgi intermediate compartment glycerol−3−phosphate biosynthetic process 100 0 Not regulation of cell shape phospholipid biosynthetic process BV glycerophospholipid metabolic process Negative 50 trans−Golgi network peptidoglycan biosynthetic process DNA−templated transcription, elongation Positive Intermediate alanyl−tRNA aminoacylation 0 Not phosphoenolpyruvate−dependent sugar phosphotransferase system actin filament reorganization involved in cell cycle ER to Golgi vesicle−mediated transport Negative cytoskeleton cell division retrograde vesicle−mediated transport, Golgi to ER Intermediate nitrogen fixation Golgi to plasma membrane protein transport dTTP biosynthetic process RQC complex glutamine biosynthetic process peptide metabolic process tyrosyl−tRNA aminoacylation lipoteichoic acid biosynthetic process prolyl−tRNA aminoacylation dTMP biosynthetic process Cdc48p−Npl4p−Vms1p AAA ATPase complex nucleoside metabolic process D−alanine biosynthetic process one−carbon metabolic process 'de novo' CTP biosynthetic process tetrahydrofolate biosynthetic process cytosolic large ribosomal subunit cellular phosphate ion homeostasis regulation of DNA−templated transcription, elongation negative regulation of phosphate metabolic process cell cycle phosphate ion transport removal of superoxide radicals nuclear chromatin 'de novo' AMP biosynthetic process diaminopimelate biosynthetic process ribophagy 'de novo' UMP biosynthetic process establishment of endoplasmic reticulum localization Doa10p ubiquitin ligase complex positive regulation of histone H2B ubiquitination glutamine metabolic process endoplasmic reticulum membrane fusion retrograde protein transport, ER to cytosol Hrd1p ubiquitin ligase ERAD−L complex threonyl−tRNA aminoacylation ER−associated ubiquitin−dependent protein catabolic process NAD biosynthetic process ER−associated misfolded protein catabolic process cytoplasm−proteasomal ubiquitin−dependent protein catabolic process regulation of translation nucleus−proteasomal ubiquitin−dependent protein catabolic process VCP−NPL4−UFD1 AAA ATPase complex sterol regulatory element binding protein cleavage deoxyribonucleotide biosynthetic process cellular protein complex disassembly ribosome−associated ubiquitin−dependent protein catabolic process mitotic spindle midzone 'de novo' pyrimidine nucleobase biosynthetic process positive regulation of protein localization to nucleus negative regulation of telomerase activity D−gluconate metabolic process nonfunctional rRNA decay 1 2 tRNA threonylcarbamoyladenosine modification mitochondria−associated ubiquitin−dependent protein catabolic process ATP−binding cassette (ABC) transporter complex log2−iBAQ BP piecemeal microautophagy of nucleusProinfla SCF complex disassembly in response to cadmium stress 0 1 2 40 carbohydrate derivative biosynthetic process 0 log2−iBAQ BP protein localization to Golgi apparatusProinfla 60 40 20 actin cytoskeleton organization High − − − protein initiator methionine removal mitotic sister chromatid separation 0.02 0.04 3040 mitotic spindle disassembly LOGFC Adj.P.Val tetrahydrofolate interconversion AMPactin cytoskeletonsalvage organization sister chromatid biorientation LowHigh 30 isoprenoid biosynthetic process ATP metabolic process 20 angiogenesisAMP salvage malate metabolic process MediumLow Proinfla DNA replication response to oxidative stress 0 20 antibacterialangiogenesis humoral response Medium 0 50 10 protein dephosphorylation BV 50 −

Proinfla 0.02 0.04 BV 10 glycine biosynthetic process from serine antimicrobialantibacterial humoralhumoral responseimmune response BV value 0 Aggregated BVPositive LOGFC Adj.P.Val - BV nucleotide biosynthetic process antimicrobial humoral responseimmune response LOGFC Species0 LOG iBAQ value NotPositive P selenocysteinyl−tRNA(Sec) biosynthetic process apoptoticantimicrobial process humoral response 2 Proinfla SpeciesAerococcus christenseniiselenocysteine biosynthetic process NegativeNot Arp2/3apoptotic complex process−mediated actin nucleation 200 AtopobiumAerococcus vaginae christenseniiseryl−tRNA aminoacylation High IntermediateNegative value tRNA processing ATPArp2/3 synthesis complex coupled−mediated proton actin transport nucleation - P CoriobacterialesAtopobium vaginae bacterium DNF00809 150 Intermediate LOGFC glycerol−3−phosphate catabolic process barbedATP synthesis−end actin coupled filament proton capping transport Low NA GardnerellaCoriobacteriales vaginalisglycerol bacterium−3−phosphate biosyntheticDNF00809 process 1 2 biosyntheticbarbed−end processactin filament capping 100 Proinflammatory log2HomoGardnerella−iBAQ sapiens vaginalis (Human)phospholipid biosynthetic process BP BV Proinfla 40 carbohydratebiosynthetic process derivative biosynthetic process LactobacillusHomo sapiens crispatus (Human)glycerophospholipid metabolic process actin cytoskeleton organization 50 PositivecytokinesHigh 3 1 2 30 DNA−templated transcription, elongation carbohydrate metabolicderivative biosyntheticprocess process log2Lactobacillus−iBAQ delbrueckiicrispatusalanyl−tRNA aminoacylation BPAMP salvage ProinflaLow 3 40 carboxyliccarbohydrate acid metabolic metabolic process process 0 Not 20Lactobacillusactin filament hamsteridelbrueckii reorganization involved in cell cycle actinangiogenesis cytoskeleton organization HighMedium Proinfla cellcarboxylic adhesion acid metabolic process 1030Lactobacillus helveticushamsteriER to Golgi vesicle (Lactobacillus−mediated transport suntoryeus) AMPantibacterial salvage humoral response NegativeLow retrograde vesicle−mediated transport, Golgi to ER cell cycleadhesion BV BV 20Lactobacillus inershelveticus (Lactobacillus suntoryeus) angiogenesisantimicrobial humoral immune response IntermediateMedium 0 Golgi to plasma membrane protein transport cell divisioncycle Positive Proinfla Lactobacillus jenseniiiners dTTP biosynthetic process antibacterialantimicrobial humoralhumoral responseresponse Species10 cell proliferationdivision BVNot BV Lactobacillus johnsoniijensenii peptide metabolic process antimicrobialapoptotic process humoral immune response 0Aerococcus christenseniityrosyl−tRNA aminoacylation cell redoxproliferation homeostasis PositiveNegative Lactobacillus kefiranofaciensjohnsonii antimicrobialArp2/3 complex humoral−mediated response actin nucleation SpeciesAtopobium vaginae prolyl−tRNA aminoacylation cell wallredox organization homeostasis NotIntermediate MegasphaeraLactobacillus kefiranofaciens genomosp. type_1 apoptoticATP synthesis process coupled proton transport AerococcusCoriobacteriales christensenii bacteriumdTMP biosyntheticDNF00809 process cell− wallcell organizationadhesion Negative PrevotellaMegasphaera amnii genomosp.D−alanine type_1 biosynthetic process Arp2/3barbed −complexend actin−mediated filament cappingactin nucleation AtopobiumGardnerella vaginae vaginalis cellularcell−cell iron adhesion ion homeostasis Intermediate Prevotella sp.amnii BV3P1one−carbon metabolic process ATPbiosynthetic synthesis process coupled proton transport CoriobacterialesHomo sapiens (Human) bacteriumtetrahydrofolate biosyntheticDNF00809 process chromosomecellular iron ion condensation homeostasis Prevotella timonensissp. BV3P1 barbedcarbohydrate−end actin derivative filament biosynthetic capping process GardnerellaLactobacillus vaginalis crispatuscellular phosphate ion homeostasis cyanatechromosome catabolic condensation process 3 RhizoctoniaPrevotellanegative timonensis solani regulation of phosphate metabolic process biosyntheticcarbohydrate process metabolic process HomoLactobacillus sapiens delbrueckii (Human) Dcyanate−gluconate catabolic metabolic process process RoseburiaRhizoctonia inulinivorans solani phosphate ion transport carbohydratecarboxylic acid derivative metabolic biosynthetic process process Lactobacillus crispatushamsteriremoval of superoxide radicals deD− gluconatenovo' AMP metabolic biosynthetic process process

3 SchizosaccharomycesRoseburia inulinivorans japonicus yFS275 carbohydratecell adhesion metabolic process Lactobacillus delbrueckiihelveticusdiaminopimelate (Lactobacillus biosynthetic process suntoryeus) de novo' CTPAMP biosyntheticbiosynthetic processprocess SneathiaSchizosaccharomyces amnii japonicus yFS275 carboxyliccell cycle acid metabolic process Lactobacillus hamsteriiners ribophagy de novo' pyrimidineCTP biosynthetic nucleobase process biosynthetic process unculturedSneathiaestablishment amnii Ruminococcus of endoplasmic sp.reticulum localization cell adhesiondivision Lactobacillus helveticusjensenii (Lactobacillus suntoryeus) deoxyribonucleotidede novo' pyrimidine nucleobase biosynthetic biosynthetic process process Wallemiauncultured ichthyophagapositive Ruminococcus regulation of histoneEXF sp. −H2B994 ubiquitination cell cycleproliferation Lactobacillus inersjohnsoniiendoplasmic reticulum membrane fusion fructosedeoxyribonucleotide 1,6−bisphosphate biosynthetic metabolic process process Wallemia ichthyophaga EXF−994 cell divisionredox homeostasis Lactobacillus jenseniikefiranofaciensretrograde protein transport, ER to cytosol gluconeogenesisfructose 1,6−bisphosphate metabolic process ER−associated ubiquitin−dependent protein catabolic process cell proliferationwall organization

1 LactobacillusMegasphaera johnsonii genomosp. type_1 glucosegluconeogenesis metabolic process ER−associated misfolded protein catabolic process cell− redoxcell adhesion homeostasis 1 cytoplasmLactobacillusPrevotella−proteasomal amnii ubiquitin kefiranofaciens−dependent protein catabolic process glutamineglucose metabolic biosynthetic process process cellcellular wall iron organization ion homeostasis nucleusMegasphaeraPrevotella−proteasomal sp. ubiquitin BV3P1genomosp.−dependent protein type_1 catabolic process glutamylglutamine−tRNA biosynthetic aminoacylation process cellchromosome−cell adhesion condensation Prevotellasterol amniitimonensis regulatory element binding protein cleavage glycineglutamyl biosynthetic−tRNA aminoacylation process from serine cellular protein complex disassembly cellularcyanate iron catabolic ion homeostasis process PrevotellaRhizoctonia sp. solani BV3P1 glycolyticglycine biosynthetic process process from serine ribosome−associated ubiquitin−dependent protein catabolic process chromosomeD−gluconate metaboliccondensation process PrevotellaRoseburiapositive timonensisinulinivorans regulation of protein localization to nucleus immuneglycolytic response process cyanatede novo' catabolicAMP biosynthetic process process RhizoctoniaSchizosaccharomyces solaninegative regulation japonicus of telomerase yFS275 activity intracellularimmune response protein transport nonfunctional rRNA decay Dde− gluconatenovo' CTP metabolic biosynthetic process process RoseburiaSneathia amnii inulinivorans lipoteichoicintracellular acidprotein biosynthetic transport process mitochondria−associated ubiquitin−dependent protein catabolic process de novo' AMPpyrimidine biosynthetic nucleobase process biosynthetic process Schizosaccharomycesuncultured Ruminococcuspiecemeal japonicus microautophagy sp. yFS275 of nucleus metaboliclipoteichoic process acid biosynthetic process dedeoxyribonucleotide novo' CTP biosynthetic biosynthetic process process SneathiaWallemiaSCF complex amniiichthyophaga disassembly in response EXF− to994 cadmium stress NADmetabolic biosynthesis process via nicotinamide riboside salvage pathway protein localization to Golgi apparatus defructose novo' 1,6pyrimidine−bisphosphate nucleobase metabolic biosynthetic process process uncultured Ruminococcus sp. NAD biosyntheticbiosynthesis processvia nicotinamide riboside salvage pathway mitotic sister chromatid separation deoxyribonucleotidegluconeogenesis biosynthetic process Wallemia ichthyophaga mitoticEXF spindle−994 disassembly negativeNAD biosynthetic regulation process of coagulation 1 fructoseglucose metabolic1,6−bisphosphate process metabolic process sister chromatid biorientation neutrophilnegative regulation degranulation of coagulation glutamine biosynthetic process ATP metabolic process gluconeogenesisnucleosideneutrophil degranulation metabolic process

1 malate metabolic process glutamyl−tRNA aminoacylation glucoseOthersnucleoside metabolic metabolic process process response to oxidative stress glycine biosynthetic process from serine glutaminephosphoenolpyruvateOthers biosynthetic− processdependent sugar phosphotransferase system 0 glutamylglycolytic− tRNAprocess aminoacylation0 60 40 20 postphosphoenolpyruvate−translational protein−dependent modification20 40 sugar phosphotransferase system glycineimmune biosynthetic response process− − − from serine proteinpost−translational dephosphorylation protein modification 0.02 0.04 intracellular protein transport glycolyticprotein foldingdephosphorylation process LOGFC 2 lipoteichoic acid biosynthetic processAdj.P.Val immuneprotein foldingresponse

2 protein initiator methionine removal metabolic process intracellularprotein refoldinginitiator protein methionine transport removal NAD biosynthesis via nicotinamide riboside salvage pathway lipoteichoicregulationprotein refolding of acid DNA biosynthetic−templated process transcription, elongation NAD biosynthetic process metabolicregulation process of transcription,DNA−templated DNA transcription,−templated elongation negative regulation of coagulation NADribosomeregulation biosynthesis biogenesisof transcription, via nicotinamide DNA−templated riboside salvage pathway neutrophil degranulation NADtranscription,ribosome biosynthetic biogenesis DNA process−templated negativenucleoside regulation metabolic of processcoagulation 0 0 translationtranscription, DNA−templated 20 20 neutrophilOthers degranulation BP − 0 0 0.02 0.04 translationaltranslation termination 20 20 phosphoenolpyruvate−dependent sugar phosphotransferase system

BP nucleoside metabolic process − LOGFC 0.02 0.04 NAtranslational termination Adj.P.Val post−translational protein modification LOGFC OthersNA

Adj.P.Val Species phosphoenolpyruvateprotein dephosphorylation−dependent sugar phosphotransferase system Species protein folding 2 post−translational protein modification protein dephosphorylationinitiator methionine removal protein foldingrefolding 2 proteinregulation initiator of DNA methionine−templated removal transcription, elongation proteinregulation refolding of transcription, DNA−templated regulationribosome biogenesisof DNA−templated transcription, elongation regulationtranscription, of transcription,DNA−templated DNA−templated 0 0 ribosometranslation biogenesis 20 20 BP − 0.02 0.04 transcription,translational termination DNA−templated LOGFC 0 0 Adj.P.Val translationNA 20 20 BP − 0.02 0.04 Species translational termination LOGFC Adj.P.Val NA Species bioRxiv preprint doi: https://doi.org/10.1101/2020.03.10.986646; this version posted September 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

a ribonucleoside-diphosphate reductase complex Low (n=5) S-layer High (n=5) CC small ribosomal subunit large ribosomal subunit peptidoglycan-based cell wall peptidoglycan biosynthetic process * cell wall organization * transcription, DNA-templated BP PEP-PTS * ATP metabolic process * response to oxidative stress regulation of cell shape * malate metabolic process

0 100 200 300 Log2-transformed GO iBAQ value

b p=0.0172* c p=0.0272 d p=0.0327 e p=0.0002* 2.3 2.4 4.8 2.3 2.2 4.6 2.3 2.2 2.1 4.4 2.2 value Membrane relative abundance relative Membrane 2.1 Cell wall organization relative abundance organization relative Cell wall 2.0 Regulation of cell shape relative abundance Regulation of cell shape relative 4.2 Peptidoglycan biosynthetic process relative abundance biosynthetic process relative Peptidoglycan 2.1 2.0 iBAQ 1.9 Plasma membrane CC membrane Plasma 4.0 Cell wall organization BP Cellwallorganization Regulation of cell shape BP BP shape cell of Regulation

0 1 0 1 0 1 0 1

Inflammatory response Inflammatory response Inflammatory response Inflammatory response Peptidoglycan biosynthetic process BP process biosynthetic Peptidoglycan f p=0.0092* g p=0.0327 h p=0.1372 i p=0.1372 3.0 1.8 1.6 1.6 2.5 transformed GO transformed - 1.6 1.4 1.4 10 2.0 1.2 1.2 Log 1.4 1.5 based cell wallCC cell based - layer CC layer Cell wall relative abundance relative Cell wall Membrane relative abundance relative Membrane - 1.0 1.0 1.0 S based cell wall relative abundance relative − based cell wall Peptidoglycan based cell wall relative abundance relative − based cell wall Peptidoglycan 1.2 Cell wall CC wall Cell Membrane CC CC Membrane 0.5 0.8 0.8 1.0 0.0

0 1 0 1 0 1 0 1 Peptidoglycan Inflammatory response Inflammatory response Inflammatory response Inflammatory response Lactobacillus isolates inducing low inflammatory responses (n=22) Lactobacillus isolates inducing high inflammatory responses (n=22) bioRxiv preprint doi: https://doi.org/10.1101/2020.03.10.986646; this version posted September 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

a b Low Medium High V1 No Antibiotic treatment V2 V1 Antibiotic treatment V2 Visit 1 (n=74) 1 Visit 2 (n=74) High High 11 High 1 High 2 2 5 1 Low 1 1 1 1 9 1 High High 0 Medium Medium 13 0 Low Low Medium 8 −Medium1 Medium Medium 6 −1 1 4 −2 Standardized PC2 18.3%) Standardized PC2 (20.6%) −2 Low Low 10 −2 −1 06000 1 Low 1 −1 0 1 2 standardizedStandardized PC1 (29.8% PC1 explained (29.8%)standardized var.)Standardized PC1 (35.5% 4000PC1 (35.5%) explained var.) 2000 standardized PC2 (18.3% explained var.) PC2 (18.3% explained standardized standardized PC2 (20.6% explained var.) PC2 (20.6% explained standardized Visit 1 Visit 2 0 c Infla V1

LOG2 iBAQ High 40 40 40 4040 40 Low 30 3030 30 30 30 Medium 20 20 20 20 1020 10 Infla V2 10 0 0 High 10 Infla Infla10V1 V1 0 Low 0 High0 High Infla V1 Low0 Low Medium Infla V1 InflaMedium V1Medium

Sneathia.amnii Inflammation HighInflaInfla V1 V1Infla V2 Treatment Prevotella.amnii Infla InflaV2 V2 Treatment High Sneathia.amnii.1 Prevotella.amnii.1 Lactobacillus.iners High Atopobium.vaginae High High

Low Treatment Lactobacillus.iners.1 Low Lactobacillus.jensenii Gardnerella.vaginalis Atopobium.vaginae.1 High Antibiotic treatmenttreatment Lactobacillus.johnsonii Low Low Lactobacillus.crispatus Gardnerella.vaginalis.1 Low Lactobacillus.jensenii.1 Lactobacillus.delbrueckii Lactobacillus.crispatus.1 Lactobacillus.johnsonii.1 MediumMediumMedium Antifungal treatment Medium Lactobacillus.delbrueckii.1 MediumLow Antifungal treated Infla V2TreatmentTreatment 6000No treatment Infla V2 D3LTX3 D3LVS9 D6S624 D3LTX3 D3LVS9 D6S624 C8PAE9 D3LUF4 D3LW49 C8PAE9 D3LUF4 E1NP90 D3LW49 E1NP90 E1GVQ0 K1NQK4 E1GVQ0 K1NQK4 E1GWR9 E1GWR9 AntibioticAntibiotic treatment treatment Megasphaera.genomosp..type_1 D3LTX3.1 D3LTX3.1D3LVS9.1 D3LVS9.1 D6S624.1 D6S624.1 Medium C8PAE9.1 C8PAE9.1D3LUF4.1 D3LUF4.1D3LW49.1 D3LW49.1 E1NP90.1E1NP90.1 V2 E1GVQ0.1E1GVQ0.1 K1NQK4.1K1NQK4.1 V1 Infla V2 E1GWR9.1E1GWR9.1 A0A120DII0 A0A120DII0 Megasphaera.genomosp..type_1.1 A0A0J8F9A7 A0A135Z8S8A0A0J8F9A7 A0A135Z8S8

A0A0U0XF48 A0A0U0XF48 D3LTX3 D3LVS9 D6S624 C8PAV5 A0A0E9E2G5 A0A0U0XLH1 E3BTZ6 A0A0U0XX74 A0A133NZQ8A0A0E9E2G5 A0A1C2D6A4A0A0U0XLH1 A0A0U0XX74 A0A133NZQ8D3LVS9 A0A1C2D6A4D6S624 A0A120DII0.1 A0A120DII0.1 C8PAE9 C8PAS6 D3LTX3 AntifungalAntifungal treated treated C8PAE9 D3LUF4 C8PBF6 E1NP90 D3LVS9 D6S624 D3LUD9 A0A0U0XMC3 A0A0U0XMC3 D3LW49 K1MY98 C8PAE9 K1NQK4 E1NQZ0 D3LUF4 D3LW49 E1NP90 A0A0J8F9A7.1E1NMB0 A0A0J8F9A7.1 High

E1GVQ0 A0A135Z8S8.1 A0A135Z8S8.1 Infla V1 Infla V1 Infla V2 Infla V2

C8PBW2 K1NQK4 High A0A0U0XF48.1 A0A0U0XF48.1

C8PCW6 E1GVQ0 4000 E1GWR9 A0A0E9E2G5.1 A0A0U0XLH1.1A0A0E9E2G5.1 A0A0U0XX74.1 A0A0U0XLH1.1A0A133NZQ8.1A0A0U0XX74.1 A0A1C2D6A4.1A0A133NZQ8.1 A0A1C2D6A4.1 A0A0U0XMC3.1 A0A0U0XMC3.1 D3LTX3.1 E1GWR9 D6S624.1 D3LVS9.1

D3LVS9.1 C8PAV5.1 D6S624.1 E3BTZ6.1 D3LTX3.1 C8PAE9.1 D3LUF4.1C8PAE9.1 C8PAS6.1 E1NP90.1 D3LVS9.1 D6S624.1 C8PBF6.1

D3LW49.1 D3LUD9.1 K1MY98.1 C8PAE9.1 D3LUF4.1 E1NP90.1 K1NQK4.1 E1NQZ0.1 D3LW49.1 E1GVQ0.1 E1NMB0.1 No treatmentNo treatment K1NQK4.1 C8PBW2.1 E1GVQ0.1 E1GWR9.1C8PCW6.1 High E1GWR9.1 Treatment Treatment A0A120DII0 A0A120DII0 A0A120DII0

A0A228ZSI6 Infla V2 A0A0J8F9A7 Low A0A135Z8S8 A0A135P9T8 A0A0J8F9A7 Low A0A0U0XF48 A0A135Z8S8 A0A0E9E2G5 A0A0U0XLH1 A0A0U0XX74 A0A133NZQ8A0A0U0XF48 A0A1C2D6A4 A0A0U0XF48 A0A120DII0.1 A0A0U0XLH1 A0A0E9E2G5 A0A0U0XLH1 A0A0U0XX74 A0A133NZQ8 A0A1C2D6A4A0A120DII0.1 A0A120DII0.1 A0A0E9DXN4 D3LVS9 D6S624

A0A0U0XMC3 A0A0E9DWT8 C8PAV5 E3BTZ6 A0A228ZSI6.1 2000 C8PAE9 C8PAS6 A0A0U0XMC3 C8PBF6 D3LUD9 A0A0J8F9A7.1 K1MY98 E1NQZ0 A0A135Z8S8.1 Infla V1 Infla V2

A0A135P9T8.1 A0A0J8F9A7.1 Infla V1 Infla V2 E1NMB0 A0A0U0XF48.1 A0A135Z8S8.1 Infla V1 Infla V2 C8PBW2 A0A0U0XX74.1 A0A0U0XF48.1 A0A0E9E2G5.1 C8PCW6 A0A0U0XLH1.1 A0A133NZQ8.1 A0A1C2D6A4.1 A0A0U0XF48.1 Low A0A0U0XLH1.1 A0A0E9E2G5.1 A0A0U0XLH1.1 A0A0U0XX74.1 A0A133NZQ8.1 A0A1C2D6A4.1 A0A0E9DXN4.1 A0A0U0XMC3.1 Infla D3LVS9.1 D6S624.1 A0A0E9DWT8.1 C8PAV5.1 Infla E3BTZ6.1

C8PAE9.1 C8PAS6.1 A0A0U0XMC3.1 C8PBF6.1 D3LUD9.1 K1MY98.1 E1NQZ0.1 E1NMB0.1

C8PBW2.1 Medium C8PCW6.1 Medium

A0A120DII0 High A0A228ZSI6

A0A135P9T8 Medium 0 A0A0U0XF48 A0A0U0XLH1 A0A120DII0.1 A0A0E9DXN4 A0A0E9DWT8 A0A228ZSI6.1 A0A135P9T8.1 Treatment A0A0U0XF48.1 A0A0U0XLH1.1 A0A0E9DXN4.1 Visit 1 A0A0E9DWT8.1 Visit 2 Low Infla V1 d Infla V1 Infla V2 Agg Medium. LOG iBAQ High 600040 60002 Low 40003060004000 Medium 20002040002000 Infla V2 01020000 High InflaInfla V1 V1 Low 00 Medium InflaHigh V1High Sneathia.amnii Inflammation Infla V1 Infla V2 Treatment Prevotella.amnii Infla V1 Treatment Sneathia.amnii.1 Prevotella.amnii.1 Low Low Lactobacillus.iners Atopobium.vaginae High Lactobacillus.iners.1 Treatment Lactobacillus.jensenii Gardnerella.vaginalis Atopobium.vaginae.1 MediumHighMedium Antibiotic treatment Lactobacillus.crispatus Lactobacillus.johnsonii Lactobacillus.jensenii.1 Gardnerella.vaginalis.1 sp. Lactobacillus.delbrueckii sp. Lactobacillus.crispatus.1 Lactobacillus.johnsonii.1 Low Antifungal treatment Lactobacillus.delbrueckii.1 Antifungal treated iners amnii amnii iners amnii Low V2 amnii V1 InflaInfla V2 V2 jensenii vaginae jensenii vaginae crispatus johnsonii crispatus johnsonii Medium No treatment Sneathia.amnii delbrueckii delbrueckii Infla V1 Infla V2 Prevotella.amnii Megasphaera.genomosp..type_1 Medium Sneathia.amnii.1 HighHigh Prevotella.amnii.1 Megasphaera.genomosp..type_1.1 Lactobacillus.iners Sneathia Sneathia MegasphaeraPrevotella Atopobium.vaginae MegasphaeraPrevotella Infla V2 6000 Sneathia.amnii Lactobacillus.iners.1Sneathia.amnii Infla Infla Lactobacillus Infla V1 Infla V1 Infla V2 Infla V2 Lactobacillus.jensenii Gardnerella.vaginalis Lactobacillus Atopobium.vaginae.1 Low Low AtopobiumPrevotella.amnii Prevotella.amnii Atopobium Sneathia.amnii.1Sneathia.amnii.1 Infla V2 Lactobacillus.crispatus Lactobacillus.johnsoniiGardnerella vaginalis Gardnerella vaginalis Lactobacillus Lactobacillus Lactobacillus.jensenii.1 Gardnerella.vaginalis.1 Prevotella.amnii.1Prevotella.amnii.1 LactobacillusLactobacillus.iners Lactobacillus.iners Lactobacillus

D3LVS9 D6S624 4000 C8PAV5 E3BTZ6 Lactobacillus C8PAE9 C8PAS6 Lactobacillus C8PBF6 Lactobacillus.johnsonii.1 D3LUD9 Lactobacillus.delbrueckii Atopobium.vaginae Atopobium.vaginae Lactobacillus.crispatus.1 K1MY98 E1NQZ0

E1NMB0 High C8PBW2 C8PCW6

Treatment Treatment MediumMedium Lactobacillus Lactobacillus.iners.1 Lactobacillus.iners.1 D3LVS9.1 D6S624.1

Lactobacillus C8PAV5.1 E3BTZ6.1 Treatment C8PAE9.1 C8PAS6.1 C8PBF6.1 D3LUD9.1 Gardnerella.vaginalis Gardnerella.vaginalis Atopobium.vaginae.1Atopobium.vaginae.1 K1MY98.1 Lactobacillus.jensenii Lactobacillus.jensenii Lactobacillus.delbrueckii.1 E1NQZ0.1 E1NMB0.1 C8PBW2.1 C8PCW6.1 Lactobacillus.johnsonii Lactobacillus.johnsonii Lactobacillus.crispatus Lactobacillus.crispatus A0A120DII0 Gardnerella.vaginalis.1 Gardnerella.vaginalis.1 High Lactobacillus.jensenii.1 Lactobacillus.jensenii.1 A0A228ZSI6

A0A135P9T8 2000

Lactobacillus.delbrueckii A0A0U0XF48 Lactobacillus.delbrueckii Lactobacillus.crispatus.1 Lactobacillus.crispatus.1Lactobacillus.johnsonii.1 Lactobacillus.johnsonii.1 A0A0U0XLH1 A0A120DII0.1 Low A0A0E9DXN4 A0A0E9DWT8 A0A228ZSI6.1

A0A135P9T8.1 TreatmentTreatment Lactobacillus.delbrueckii.1A0A0U0XF48.1 Lactobacillus.delbrueckii.1 A0A0U0XLH1.1 A0A0E9DXN4.1 A0A0E9DWT8.1 Low Megasphaera.genomosp..type_1 Visit 1 Visit 2 Infla V1 Infla V2 Medium 0 e Megasphaera.genomosp..type_1.1 AntibioticAntibiotic treatment treatment Megasphaera.genomosp..type_1 Megasphaera.genomosp..type_1 Medium Infla V1 Megasphaera.genomosp..type_1.1Megasphaera.genomosp..type_1.1 AntifungalAntifungal treated treated Agg. LOG iBAQ High No40 treatmentNo 2treatment 200 Low 30 Medium 200150 20 Infla V2 150100 5010 High 100 Low 0 Medium Infla V1

Sneathia.amnii InflaInflammationInfla V1 V1 Infla V2 Treatment Prevotella.amnii 50 Treatment Sneathia.amnii.1 Prevotella.amnii.1 Lactobacillus.iners High Atopobium.vaginae

Lactobacillus.iners.1 Treatment PTS Lactobacillus.jensenii Gardnerella.vaginalis Atopobium.vaginae.1 - PTS High Antibiotic treatmenttreatment Lactobacillus.crispatuslayer Lactobacillus.johnsonii - layer - Lactobacillus.jensenii.1 Gardnerella.vaginalis.1 Low - V1 0 V2 S S.layer Lactobacillus.delbrueckii Lactobacillus.crispatus.1 Lactobacillus.johnsonii.1 cell.wall S S.layer.1 Antifungal treatment PEP.PTS S.layer cell.wall.1Lactobacillus.delbrueckii.1 Antifungal treated PEP templated membrane MediumLow - PEPPEP.PTS.1templated - membrane.1 S.layer.1 Infla V1 Medium No treatment based cell wall Megasphaera.genomosp..type_1 - RNDP complex based cell wall RNDP complex Infla V2 Infla Infla V1 Infla V2

- Infla cell.wall.organization Megasphaera.genomosp..type_1.1 regulation.of.cell.shape cell.wall.organization.1 High

malate.metabolic.process ABC.transporter.complex regulation.of.cell.shape.1 Infla V1 Infla V2 High response.to.oxidative.stress cell wall organization malate.metabolic.process.1 cell wall organization ABC.transporter.complex.1 Treatment Infla V2 transcription..DNA.templated peptidoglycan.based.cell.wall response.to.oxidative.stress.1 D3LVS9 D6S624 C8PAV5 E3BTZ6 C8PAE9 C8PAS6

ATP metabolic process C8PBF6 D3LUD9 K1MY98 transcription..DNA.templated.1E1NQZ0 regulation of cell shape large ribosomal subunit ATP metabolic processE1NMB0 peptidoglycan.based.cell.wall.1

C8PBW2 regulation of cell shape large ribosomal subunit C8PCW6 LowLow D3LVS9.1 D6S624.1

small ribosomal subunit C8PAV5.1 E3BTZ6.1 C8PAE9.1 smallC8PAS6.1 ribosomal subunit C8PBF6.1 D3LUD9.1 K1MY98.1 E1NQZ0.1

ABC transporter complex E1NMB0.1 Text color

malate metabolic process ABC transporter complex C8PBW2.1 C8PCW6.1 malate metabolic process Treatment peptidoglycan.biosynthetic.processcell.wall.organization A0A120DII0 High A0A228ZSI6 peptidoglycan.biosynthetic.process.1

response to oxidative stress A0A135P9T8 A0A0U0XF48

A0A0U0XLH1 response to oxidative stress A0A120DII0.1 Medium A0A0E9DXN4

A0A0E9DWT8 cell.wall.organization.1 ATP.metabolic.processregulation.of.cell.shape A0A228ZSI6.1 transcription, DNA large.ribosomal.subunit transcription, DNA A0A135P9T8.1 Medium peptidoglycan small.ribosomal.subunit A0A0U0XF48.1 A0A0U0XLH1.1

A0A0E9DXN4.1 peptidoglycan

A0A0E9DWT8.1 BP ATP.metabolic.process.1regulation.of.cell.shape.1 Low large.ribosomal.subunit.1small.ribosomal.subunit.1

malate.metabolic.process Infla V1 Infla V2

response.to.oxidative.stress peptidoglycan biosynthetic process malate.metabolic.process.1 peptidoglycan biosynthetic process CC transcription..DNA.templated InflaMedium V2 peptidoglycan.based.cell.wall response.to.oxidative.stress.1 transcription..DNA.templated.1 peptidoglycan.based.cell.wall.1 High peptidoglycan.biosynthetic.process peptidoglycan.biosynthetic.process.1 Low Medium

ribonucleoside.diphosphate.reductase.complex ATP.binding.cassette..ABC..transporter.complex ribonucleoside.diphosphate.reductase.complex.1 Treatment ATP.binding.cassette..ABC..transporter.complex.1 Antibiotic treated Antifungal treated No treatment

phosphoenolpyruvate.dependent.sugar.phosphotransferase.system

phosphoenolpyruvate.dependent.sugar.phosphotransferase.system.1