BMJ Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s) Ann Rheum Dis

1

Supplementary data

Supplementary methods

Sequencing

DNA was extracted from one swab from each sample using an initial bead beating step followed by extraction using the Maxwell 16 Tissue DNA Purification Kit and Maxwell 16 Research Instrument System (Promega, USA). DNA concentration was measured using a Qubit dsDNA BR Assay Kit (ThermoFisher, USA) and normalized to a concentration of 5 ng/ul. The 16S rRNA gene encompassing the V5 to V8 regions was targeted using the 803F (TTAGAKACCCBNGTAGTC) and 1392wR (ACGGGCGGTGWGTRC) primers, modified to contain Illumina specific adapter sequence (803F: TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTTAGAKACCCBNGTAGTC, 1392wR: GTCTCGTGGGCTCGGGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGACGGGCGGTGWGTRC) [1]. Preparation of the library was performed as described, using the workflow outlined by Illumina (#15044223 Rev.B). In the first stage, PCR products of ~590 bp were amplified according to the specified workflow with an alteration in polymerase used to substitute Q5 Hot Start High-Fidelity 2X Master Mix (New England BioLabs, USA) in standard PCR conditions. Resulting PCR amplicons were purified using Agencourt AMPure XP beads (Beckman Coulter, USA). Purified DNA was indexed with unique 8 bp barcodes using the Illumina Nextera XT 384 sample Index Kit A-D (Illumina FC-131-1002) in standard PCR conditions with Q5 Hot Start High-Fidelity 2X Master Mix. Indexed amplicons were pooled together in equimolar concentrations and sequenced on MiSeq Sequencing System (Illumina, USA) using paired end sequencing with V3 300 bp chemistry, according to manufacturer’s instructions, by the Australian Centre for Ecogenomics.

Sequence processing and statistical analysis

The sequences were initially processed using QIIME[2]. Sequences were demultiplexed, then quality filtered using Trimmomatic single end (SLIDINGWINDOW:4:15 LEADING:10 HEADCROP:23 MINLEN:250)[3]. The forward and reverse reads could not be joined therefore forward reads were used for subsequent analyses. Operational taxonomic units (OTUs) were picked by open-reference against a Greengenes (v. 13_5)/ Silva (v.119) database using the 97% cut off for percent identity. OTUs with counts higher than 10 in the extraction control (lysis buffer) were removed. The resulting OTU table was further filtered and normalized using the mixOmics package in R[4]. The prefiltering step removes low abundance OTUs with proportional counts across all samples below 0.01% [4]. The data were normalized using total sum scaling (TSS) then transformed as centered log-ratio (CLR) to account for their compositional nature [4].

Moentadj R, et al. Ann Rheum Dis 2021;0:1–9. doi: 10.1136/annrheumdis-2020-219009 BMJ Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s) Ann Rheum Dis

2

We applied multivariate sparse PLS discriminant analysis (sPLS-DA [4]) to identify an OTU signature that best classified RA vs HC groups. Repeated cross-validation was used to optimally choose each sPLS-DA signature and assess its performance based on balanced classification error rate, to take into account the minority number of samples within a patient group. The importance of each OTU in the signature was displayed using loading barplots, where the length of the bar indicates the contribution or importance of each OTU to define the sPLS-DA components, and the colour indicates the treatment group where each OTU is most abundant. A sPLS-DA component is a linear combination of selected OTUs weighted by their loading weights. As such, an individual dysbiosis score can be obtained for each subject based on a linear combination of their OTU signature. Similarly, a predicted dysbiosis score was then calculated based on the same sPLS-DA model and OTU signature, but on the FDR group. sPLS-DA analyses, receiver operating characteristic (ROC) curve and Area under the curve (AUC) were obtained using the mixOmics package in R [4]. Semi supervised hierarchical clusterings using Euclidean distance and Ward linkage method were plotted with the R base package. Stacked bar plots of phylogeny were plotted in R using the phyloseq, dplyr, tidyr, magrittr and ggplot2 packages [5-9].

For analysis of functional effects of heat-killed Streptococci and SCW, multiple groups were compared by ANOVA, having tested for skewness, using GraphPad Prism v8.

Axenic cultures generated from oral swabs.

Oral swabs (n =3 selected from the RA, FDR and HC groups) were cultured to obtain purified bacterial (axenic) Streptococcal colonies. Swabs were streaked onto selective nutritious Brain Heart Infusion (BHI) agar plates supplemented with 10% defibrinated horse blood. A total of 20 isolates of different bacterial colonies, separated based on their morphology (Supplementary Figure 1), were streaked on

o the second batch of BHI plates. After 24 h incubation at 37 C with 5% CO2, single colonies from BHI plates were streaked in triplicate onto Mitis Salivarius agar mixed with 1% potassium tellurite, which is selective for Streptococcus and Enterococcus. After two further rounds on Mitis plates, a single colony was inoculated into BHI broth for DNA extraction.

Streptococcus genus verification.

Moentadj R, et al. Ann Rheum Dis 2021;0:1–9. doi: 10.1136/annrheumdis-2020-219009 BMJ Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s) Ann Rheum Dis

3

DNA extracted from bacterial isolates was quantified with a NanoDrop Lite Spectrophotometer (Thermo Scientific, Massachusetts, United States). Axenic cultures were verified as Streptococcus species, using universal Streptococcus PCR primers (Supplementary Table 2, StrepGen) [10], with inclusion of mock-inoculated medium and no template controls for amplification of extracted DNA. For gram staining and microscopy, glass slides prepared with overnight-cultured were stained with Crystal Violet, Iodine mordant, 95% ethanol and Safranin then analyzed and photographed.

Phylogenetic Sanger Sequencing.

To determine the different Streptococcus species present in the axenic cultures, extracted DNA was sequenced (Australian Genome Research Facility, University of Queensland) after amplification in the presence of universal 16S rRNA gene targeting PCR primers 27F and 1492R. Streptococcus species were characterised with five universal 16S rRNA gene targeting PCR primers (Supplementary Table 3) (Integrated DNA Technologies, Iowa, United States) [11]. The intervening regions contain variable sequence data, which can be used to determine the best species match. The Sanger sequencing data were assembled using the Staden Package, and manually verified [12]. The consensus sequences were blasted against the bacterial 16S rRNA gene databases GreenGenes and Ribosomal Database Project to obtain the best Streptococcus match at the species level [13, 14].

Streptococcal sequencing and characterisation

Library preparation was performed using the Illumina Nextera XT DNA Library Preparation Kit (Illumina, CA). Libraries were sequenced at the Australian Centre for Ecogenomics using the Illumina NextSeq 500 platform with V2 chemistry generating approximately 1 Gb of 150 bp paired-end reads per isolate. Reads were quality and adapter trimmed using Trimmomatic v0.36 [3] with default settings plus a head crop of 10. Trimmed reads were merged using BBMerge (https://sourceforge.net/projects/bbmap/).

Assembly of each isolate was performed using SPAdes v3.11.1 [15] using k-mers 21, 33, 55 & 77. Genome completeness and contamination was assessed using CheckM v1.0.7 [16]. Taxonomic affiliation determined using GTDB-Tk v1.2.0 [17] against the GTDB database release 04-RS89 [18, 19]. Assembled genomes were annotated using Prokka v1.12 [20]. Orthology between annotations was assessed using Roary v3.11.0 [21] with minimum percentage identity of 70%. Maximum-likelihood tree generated using IQ-TREE v1.6.9 [22] with ModelFinder based on alignment of 120 marker genes

Moentadj R, et al. Ann Rheum Dis 2021;0:1–9. doi: 10.1136/annrheumdis-2020-219009 BMJ Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s) Ann Rheum Dis

4

identified using GTDB-Tk [17]. Alignment filtered for sites conserved in a minimum of 25% of genomes.

SNPs and indels were identified using Snippy v4.4.3 (https://github.com/tseemann/snippy) based on alignment to the S. parasalivarius species representative GCF_001556435.1. Bedtools v2.26.0 [23] intersect was used to compare call lists between isolates. Mappings and mutations were visualised using IGV v2.4.9 [24]. SnpEff v4.3 [25] was used to predict the effect of mutations on annotated proteins as implemented in Snippy. EggNOG and COG annotations from the v5.0 database were assigned to predicted proteins using EggNOG mapper v2.0.1 [22, 26].

Streptococcal cell wall isolation

BHI broth was inoculated with thawed axenic cultures of streptococcal isolates from RA patients (21.1, 22.1, 23.2), FDRs (16.1 and 11.1) and a healthy control subject (2.1). After 24 hr incubation bacteria

were streaked on blood agar plates to confirm purity then grown to an OD600~1.0, harvested and heat- killed. The pellet was sonicated for 5 min (Sonics vibra cellTM, Sonics & Materials Inc, Connecticut, United States), cell wall fragments were then resuspended in 0.85% NaCl and 7.5 mg rhamnose/ml at 15 µg/g) as described [27]. Cell wall fragments were sonicated again then centrifuged. The supernatant containing SCW was collected, filtered and adjusted to a final concentration of 3mg rhamnose/ml. Commercial S. pyogenes PG-PS 10S (Becton Dickinson, , New Jersey, USA) was used as control. In some experiments, streptococcal isolates were heat-killed then resuspended in Tris buffer at 50 µg/ml.

Cytokine production by splenocytes in vitro

Splenocytes were incubated with varying concentrations of heat-killed bacteria from axenic cultures, SCW or 10S PG-PS SCW, or with 10 ng/ml LPS (Escherichia coli, serotype 0111: B4) or SCW vehicle for 24h in triplicate. The cell-free supernatant was analysed with a Cytometric Bead Array (CBA) Mouse Th1/Th2/Th17 Cytokine Kit (Becton Dickinson).

SCW arthritis induction in SKG mice

Female ZAP70W163C BALB/c (SKG) mice aged 7-16 weeks were maintained under specific pathogen free conditions at the Translational Research Institute. All experiments were approved by the University of Queensland animal ethics committee. Female SKG mice (n=6 per group, 24 per experiment, 48 entire

Moentadj R, et al. Ann Rheum Dis 2021;0:1–9. doi: 10.1136/annrheumdis-2020-219009 BMJ Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s) Ann Rheum Dis

5

study) were injected i.p. once weekly for 3 weeks with either 15µg/gr body weight 10S PG-PS or SCW from axenic cultures, or once with 3 mg curdlan. Group size based on prior experiments demonstrating effect size and variance of visual score and histological score. Litters were randomised over the groups and were age-matched to availability. Treatment order and measurement by cage was randomised. Joints were monitored twice weekly and scored by an independent animal facility technician until the end point. The mice were weighed and calipers used to measure the paw and wrist width and thickness 3 times per week. At experimental completion, the animals were culled and their ankle joints collected. After decalcification, the tissue was embedded in paraffin before being sectioned (4µm) and stained with H&E for histological assessment of inflammation by a blinded investigator as described previously [28]. No data were excluded from the analysis or figures.

Moentadj R, et al. Ann Rheum Dis 2021;0:1–9. doi: 10.1136/annrheumdis-2020-219009 BMJ Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s) Ann Rheum Dis

6

Supplementary Table 1. Environmental influences questionnaire

Date

Participant Code

1) What is your date of birth?

2) Are you male or female?

3) Ethnic origin Caucasian Asian Indiginous Australian Other

4) Do you have rheumatoid arthritis? If Yes - How long have you had rheumatoid arthritis?

5) Do any of your relatives have rheumatoid arthritis? If Yes - Please list any of your relatives who have rheumatoid arthritis?

6) Do any medical conditions run in your family? If Yes - Please list any that you are aware of?

7) Please tick if YOU have now or in the past had any of these medical conditions. If you answer ‘Yes’ please write the age when it began below:

Medical Condition Age when it began High Blood Pressure Stroke/TIA Heart Disease High Cholesterol Diabetes Rheumatoid arthritis or other arthritis Lupus Thyroid Disease Anaemia (low blood count) or other blood problem Cancer Psoriasis

Moentadj R, et al. Ann Rheum Dis 2021;0:1–9. doi: 10.1136/annrheumdis-2020-219009 BMJ Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s) Ann Rheum Dis

7

Ulcerative Colitis or Crohn’s Disease Coeliac Disease Multiple Sclerosis Sjogren’s Syndrome Addison’s Disease Autoimmune Hepatitis Myositis Vasculitis

8) Please list any other medical conditions that you have ever had that are not listed in the table above

9) Please list any medications, prescribed by a doctor that you are currently taking

10) Please list any other medicines, herbal remedies, vitamins or other supplements that you are currently taking

11) Have you ever been a regular smoker? Never

Yes but I stopped

How old were you when you started smoking?

How old were you when you stopped smoking?

Average no of cigarettes per day

Yes and I still smoke

How old were you when you started smoking?

Average no of cigarettes per day

12) Do you drink alcohol? If Yes - How many glasses of alcohol per DAY?

13) Do you have any regular exercise? If Yes please describe the exercise

If Yes numbers of hours per week

14) If you are female, have you had any pregnancies? If Yes how many children have you given birth to and when?

Moentadj R, et al. Ann Rheum Dis 2021;0:1–9. doi: 10.1136/annrheumdis-2020-219009 BMJ Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s) Ann Rheum Dis

8

If Yes have any of your pregnancies lasted <6 months (miscarriage or termination)?

If Yes did you breastfeed any of your children and for how long?

15) If you are female have you ever been on the oral contraceptive pill (OCP) Never

Yes but I have ceased

How old were you when you started the OCP?

How old were you when you stopped the OCP?

Throughout your life how many years have you been on the OCP?

Yes and I am still on the OCP

How old were you when you started the OCP?

Throughout your life how many years have you been on the OCP?

16) If you are female have you been through menopause? If Yes – At what age did you cease menstruation?

17) Do you drink coffee (not decaffeinated)?

If Yes - How many cups of coffee per DAY?

18) How many times per week do you brush and/or floss your teeth?

19) How often do you see a dentist?

20) Have you had any of the following in the last 5 years? Tooth decay Bleeding gums Dental infections Loss of teeth

21) Have you within the last 2 years suffered from any of these infection related illnesses? Gastrointestinal infections with diarrhoea Urinary tract infections Genital infection Prostatitis Sinus infection requiring antibiotics Strep or other throat infections requiring antibiotics Tooth or gum infections (periodontitis, root or canal infections) Influenza (with a positive blood or swap test)

Moentadj R, et al. Ann Rheum Dis 2021;0:1–9. doi: 10.1136/annrheumdis-2020-219009 BMJ Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s) Ann Rheum Dis

9

Chest infection/pneumonia requiring antibiotics

22) Which of the following schooling/education have you completed? Primary School Secondary School Higher Education including apprenticeship or TAFE University Education - Bachelor Degree Post-graduate Qualification

23) Can you estimate your current household annual income? Less than $25 000

$26 000 - $50 000

$50 000 - $75 000

$75 000 - $100 000

More than $100 000

24) What is your current occupation?

Supplementary Table 2. Streptococcus species-specific primers.

Primer name Target Sequence (5’-3’) Reference

StrepGenFoward Streptococcus species CGACGATACATAGCCGACCTGAG [10] StrepGenReverse TCCATTGCCGAAGATTCCCTACTG

Supplementary Table 3. Universal 16S PCR primers used during sequencing [10].

Primer Target Sequence (5’ 3’) E.coli Position name

27F Most Eubacteria AGAGTTTGATCMTGGCTCAG 8-27 1492R Most Eubacteria, GGTTACCTTGTTACGACTT 1492-1507 Archaebacteria

Supplementary Table 4. Selected OTUs from sPLS-DA ranked according to their importance in defining the first component, in decreasing contribution to sPLS-DA. The stability score assigned to each OTU means how often the OTU was selected across all cross-validation runs (here using 100 * 5 fold cross-validation), which represents the reproducibility of selecting each OTU. The P value of each OTU when testing the equality between RA and non-RA controls (HC) by Wilcoxon test, taxa, and group in which each OTU is abundant are included in the table. The top discriminative OTUs to

Moentadj R, et al. Ann Rheum Dis 2021;0:1–9. doi: 10.1136/annrheumdis-2020-219009 BMJ Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s) Ann Rheum Dis

10

classify RA and HC patients belong to Enterobacteriaceae, Peptostreptococcaceae, Veillonellaceae, Streptococcaceae, and Peptococcaceae families.

OTUs Importance Stability Order Family Genus Contrib to Wilcox_P

296668 -0.48 1.00 Enterobacteriales Enterobacteriaceae NA HC 0.00

4444285 -0.34 0.99 Clostridiales Peptostreptococcaceae Peptostreptococcus HC 0.02

4328910 -0.32 1.00 Clostridiales Veillonellaceae Veillonella HC 0.02

OTU112 -0.31 1.00 Lactobacillales Streptococcaceae Streptococcus HC 0.03

4437876 -0.29 0.98 Flavobacteriales Flavobacteriaceae Capnocytophaga HC 0.05

OTU599 -0.26 0.99 Clostridiales Peptococcaceae NA HC 0.03

4304654 0.24 0.96 Micrococcaceae Rothia RA 0.04

OTU1331 -0.21 0.95 Fusobacteriales Leptotrichiaceae NA HC 0.05

4481323 0.15 0.96 Lactobacillales Lactobacillaceae Lactobacillus RA 0.05

3592276 0.15 0.81 Pasteurellales Pasteurellaceae Haemophilus RA 0.05

3721699 0.12 0.87 Neisseriales Neisseriaceae Neisseria RA 0.04

315792 -0.12 0.81 Neisseriales Neisseriaceae Neisseria HC 0.05

536098 0.11 0.93 Bifidobacteriales RA 0.05

31235 -0.11 0.75 Fusobacteriales Leptotrichiaceae NA HC 0.06

139251 -0.10 0.65 Enterobacteriales Enterobacteriaceae NA RA 0.85

OTU1049 -0.09 0.74 Clostridiales Veillonellaceae Veillonella HC 0.08

188099 0.09 0.84 Clostridiales Lachnospiraceae NA RA 0.05

871559 0.09 0.92 Corynebacteriales Corynebacteriaceae RA 0.05

4308270 0.08 0.64 Bacteroidales Prevotellaceae Prevotella RA 0.17

4465095 0.08 0.64 Campylobacterales Campylobacteraceae Campylobacter RA 0.06

OTU1062 -0.08 0.68 Oceanospirillales Oceanospirillaceae NA HC 0.06

925977 -0.07 0.64 Neisseriales Neisseriaceae Neisseria HC 0.06

4335302 0.07 0.81 Clostridiales Ruminococcaceae NA RA 0.06

450712 0.06 0.60 Lactobacillales Streptococcaceae Streptococcus RA 0.10

188650 0.06 0.76 Clostridiales Lachnospiraceae Dorea RA 0.06

4403310 0.06 0.76 Enterobacteriales Enterobacteriaceae NA RA 0.06

4473575 0.05 0.73 Bacteroidales Bacteroidaceae Bacteroides RA 0.06

528380 0.05 0.55 Clostridiales Lachnospiraceae Moryella RA 0.11

Moentadj R, et al. Ann Rheum Dis 2021;0:1–9. doi: 10.1136/annrheumdis-2020-219009 BMJ Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s) Ann Rheum Dis

11

339513 -0.05 0.55 Neisseriales Neisseriaceae Neisseria HC 0.08

4380191 0.04 0.65 Clostridiales Lachnospiraceae Roseburia RA 0.06

4332291 0.04 0.65 Lactobacillales Lactobacillaceae Lactobacillus RA 0.06

4376650 0.04 0.65 Rikenellales Rikenellaceae Alistipes RA 0.06

4466964 0.04 0.65 Bacteroidales Bacteroidaceae Bacteroides RA 0.06

167759 0.04 0.55 Lactobacillales Streptococcaceae Streptococcus RA 0.16

183651 0.04 0.55 Clostridiales Lachnospiraceae Blautia RA 0.06

4484108 -0.03 0.48 Bacteroidales Prevotellaceae Prevotella HC 0.08

2281511 0.03 0.47 Clostridiales Ruminococcaceae Faecalibacterium RA 0.06

261731 0.03 0.44 Caulobacterales Caulobacteraceae Brevundimonas RA 0.07

357770 0.02 0.43 Clostridiales Ruminococcaceae Faecalibacterium RA 0.08

908322 0.02 0.43 Pasteurellales Pasteurellaceae Haemophilus RA 0.03

4463709 0.02 0.48 Clostridiales Lachnospiraceae Roseburia RA 0.07

2026147 0.02 0.38 Lactobacillales Streptococcaceae Streptococcus RA 0.05

4309707 0.02 0.42 Bacteroidales Porphyromonadaceae Parabacteroides RA 0.07

554647 -0.01 0.41 CW040 NA NA HC 0.14

544419 0.01 0.42 Lactobacillales Streptococcaceae Streptococcus RA 0.08

Supplementary Table 5. Tests for association between dysbiosis score and additional clinical and environmental covariate information for RA and HC. P values are reported from a Pearson’s correlation test for continuous and count variables (derived from questions, Supplementary Table 1), and from a logistic regression for categorical covariates (indicated as capitalised). P-values were adjusted for multiple testing with the False Discovery Rate correction [1].

Variable P value FDR P value age 0.52 0.62 SEX 0.23 0.54 bmi 0.26 0.54 tender.joint 0.31 0.58 swollen.joint 0.08 0.54 antiCCP 0.50 0.62 SMOKER 0.04 0.41 alcohol 0.04 0.41 exercise 0.45 0.61 coffee 0.18 0.54

Moentadj R, et al. Ann Rheum Dis 2021;0:1–9. doi: 10.1136/annrheumdis-2020-219009 BMJ Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s) Ann Rheum Dis

12

children 0.35 0.61 pill 0.14 0.54 MEDICATION 0.12 0.54 brush.teeth 0.44 0.61 dentist 0.39 0.61 TOOTH.DECAY 0.81 0.82 BLEEDING.GUM 0.21 0.54 TOOTH.LOSS 0.68 0.76 INCOME 0.82 0.82

Supplementary Table 6. Best genome hits for Streptococcus OTUs with discriminatory OTUs from Supplementary Table 4.

Best hit SILVA ACT (https://www.arb- Best genome hit: Recovered isolates (public OTUs Contrib to silva.de/aligner/; August 2020) genome if higher identity found) 87% Bacteria; ; 92.9% S. parasanguinis_B isolates (94.4% OTU112 HC Gammaproteobacteria; Burkholderiales; Streptococcus pneumoniae Neisseriaceae; Neisseria GCF_001909725.1) 450712 RA 100% Bacteria; 100% S. parasalivarius & S. salivarius isolates 100% Bacteria; Firmicutes; Bacilli; 167759 RA 100% S. parasanguinis_B isolates Lactobacillales 99.2% Bacteria; Firmicutes; Bacilli; 99.2% S. parasalivarius & S. salivarius 2026147 RA Lactobacillales; Streptococcaceae; isolates (100% Streptococcus sp000187445 Streptococcus GCF_000257585.1) 99.6% Bacteria; Firmicutes; Bacilli; 99.6% S. parasalivarius & S. salivarius 544419 RA Lactobacillales; Streptococcaceae; isolates Streptococcus

Moentadj R, et al. Ann Rheum Dis 2021;0:1–9. doi: 10.1136/annrheumdis-2020-219009 BMJ Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s) Ann Rheum Dis

Moentadj R, et al. Ann Rheum Dis 2021;0:1–9. doi: 10.1136/annrheumdis-2020-219009 BMJ Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s) Ann Rheum Dis

14

Supplementary References

1. Kunin V, Engelbrektson A, Ochman H, Hugenholtz P. Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. Environ Microbiol. 2010 Jan; 12(1):118-123. 2. Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, et al. QIIME allows analysis of high-throughput community sequencing data. Nat Meth. 2010; 7(5):335-336. 3. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014 August 1, 2014; 30(15):2114-2120. 4. Lê Cao K-A, Costello M-E, Lakis VA, Bartolo F, Chua X-Y, Brazeilles R, et al. MixMC: A Multivariate Statistical Framework to Gain Insight into Microbial Communities. PLoS ONE. 2016; 11(8):e0160169. 5. McMurdie PJ, Holmes S. phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data. PLoS ONE. 2013; 8(4):e61217. 6. Wickham H, François R, Henry L, Müller K, Team RC. dplyr: A Grammar of Data Manipulation. R package version 076 2018 [cited; Available from: https://CRAN.R-project.org/package=dplyr 7. Wickham H, Henry L, Team RC. tidyr: Easily Tidy Data with 'spread()' and 'gather()' Functions. R package version 081 2018 [cited; Available from: https://CRAN.R-project.org/package=tidyr 8. Bache SM, Wickham H. magrittr: A Forward-Pipe Operator for R. R package version 15 2014 [cited; Available from: https://CRAN.R-project.org/package=magrittr 9. Wickham H, Chang W, Henry L, Pedersen TL, Takahashi K, Wilke C, et al. ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. R package version 300 2018 [cited; Available from: https://CRAN.R-project.org/package=ggplot2 10. O'Cuiv P, Aguirre de Carcer D, Jones M, Klaassens ES, Worthley DL, Whitehall VL, et al. The effects from DNA extraction methods on the evaluation of microbial diversity associated with human colonic tissue. Microb Ecol. 2011 Feb; 61(2):353-362. 11. Klindworth A, Pruesse E, Schweer T, Peplies J, Quast C, Horn M, et al. Evaluation of general 16S ribosomal RNA gene PCR primers for classical and next-generation sequencing-based diversity studies. Nucleic Acids Res. 2013 Jan 07; 41(1):e1. 12. Bonfield JK, Smith, K.F. and Staden, R. A new DNA sequence assembly program. Nucleic Acids Res. 23, 4992-4999 (1995). 13. DeSantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, Keller K, et al. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol. 2006 Jul; 72(7):5069-5072. 14. Cole JR, Chai B, Farris RJ, Wang Q, Kulam SA, McGarrell DM, et al. The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis. Nucleic Acids Res. 2005 Jan 1; 33(Database issue):D294-296. 15. Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, et al. SPAdes: A new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol. 2012 May; 19(5):455-477. 16. Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015 July 1, 2015; 25(7):1043-1055. 17. Chaumeil P-A, Mussig AJ, Hugenholtz P, Parks DH. GTDB-Tk: a toolkit to classify genomes with the Genome Database. Bioinformatics. 2019; 36(6):1925-1927. 18. Parks DH, Chuvochina M, Waite DW, Rinke C, Skarshewski A, Chaumeil P-A, et al. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nature Biotechnology. 2018 2018/11/01; 36(10):996-1004. 19. Parks DH, Chuvochina M, Chaumeil P-A, Rinke C, Mussig AJ, Hugenholtz P. A complete -to-species taxonomy for Bacteria and Archaea. Nature Biotechnology. 2020 2020/04/27.

Moentadj R, et al. Ann Rheum Dis 2021;0:1–9. doi: 10.1136/annrheumdis-2020-219009 BMJ Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s) Ann Rheum Dis

15

20. Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics. 2014 July 15, 2014; 30(14):2068-2069. 21. Page AJ, Cummins CA, Hunt M, Wong VK, Reuter S, Holden MTG, et al. Roary: rapid large- scale pan genome analysis. Bioinformatics. 2015; 31(22):3691-3693. 22. Nguyen L-T, Schmidt HA, von Haeseler A, Minh BQ. IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies. Molecular Biology and Evolution. 2014; 32(1):268-274. 23. Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010 March 15, 2010; 26(6):841-842. 24. Thorvaldsdottir H, Robinson JT, Mesirov JP. Integrative Genomics Viewer (IGV): high- performance genomics data visualization and exploration. Brief Bioinform. 2013 Mar; 14(2):178-192. 25. Cingolani P, Platts A, Wang LL, Coon M, Nguyen T, Wang L, et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w(1118); iso-2; iso-3. Fly (Austin). 2012 Apr-Jun; 6(2):80-92. 26. Huerta-Cepas J, Forslund K, Coelho LP, Szklarczyk D, Jensen LJ, von Mering C, et al. Fast Genome-Wide Functional Annotation through Orthology Assignment by eggNOG-Mapper. Molecular Biology and Evolution. 2017; 34(8):2115-2122. 27. Dische Z, Shettles LB. A specific color reaction of methylpentoses and a spectrophotometric micromethod for their determination. J Biol Chem. 1948 Sep; 175(2):595-603. 28. Benham H, Rehaume LM, Hasnain SZ, Velasco J, Baillet AC, Ruutu M, et al. Interleukin-23 mediates the intestinal response to microbial beta-1,3-glucan and the development of spondyloarthritis pathology in SKG mice. Arthritis Rheumatol. 2014 Jul; 66(7):1755-1767.

Moentadj R, et al. Ann Rheum Dis 2021;0:1–9. doi: 10.1136/annrheumdis-2020-219009