1 Additional File 1 for:

2 Massive decay and expansion of insertion sequences drive the 3 evolution of a novel host-restricted bacterial pathogen

4 Gonzalo Yebra1,a, Andreas F Haag2,a, Maan M Neamah2,3,4, Bryan A Wee1, Emily J 5 Richardson1, Pilar Horcajo5, Sander Granneman6, María Ángeles Tormo-Más7,8, Ricardo de la 6 Fuente5, J Ross Fitzgerald1,*, José R Penadés2,7,9*

7 1The Roslin Institute, University of Edinburgh, Edinburgh, United Kingdom; 2Institute of Infection, 8 Immunity & Inflammation, University of Glasgow, Glasgow, United Kingdom; 3Faculty of Veterinary 9 Medicine, University of Kufa, Kufa, Iraq; 4Middle Euphrates Centre for Cancer and Genetic Research, 10 University of Kufa, Kufa, Iraq; 5Facultad de Veterinaria, Universidad Complutense de Madrid, Madrid, 11 Spain; 6Centre for Synthetic and Systems Biology, University of Edinburgh, Edinburgh, United Kingdom; 12 7Departamento de Ciencias Biomédicas, Facultad de Ciencias de la Salud, Universidad CEU Cardenal 13 Herrera, 46113 Moncada, Valencia, Spain; 8Severe Infection Group, Health Research Institute Hospital 14 La Fe, Valencia, Spain; 9MRC Centre for Molecular Bacteriology and Infection, Imperial College London, 15 SW7 2AZ, UK.

16 a These authors contributed equally.

17 * Corresponding authors: 18 JRF ([email protected]) 19 JRP ([email protected]) 20

21 Table of Contents

22 Additional information (page 2)

23 Supplementary Figures (page 7)

24 Figure S1. Phylogenetic tree of the integrase gene from representative examples of 25 . 26 Figure S2. Phylogenetic network of representative examples of S. aureus Pathogenicity 27 Islands. 28 Figure S3. Expression of SaaPIMVF7-encoded vwb. 29 Figure S4. Schematic representation of IS loci selected for analysis 30 Figure S5. Presence of the IS does not affect expression of the downstream gene product 31 through active transcription. 32 Figure S6. Pairwise genome alignment of S. aureus subsp. anaerobius versus S. aureus 33 subsp. aureus.

1

34 Additional Information

35 S. aureus subsp. anaerobius MLST variability

36 Multilocus Sequence Typing (MLST) of the whole genome sequences revealed more variability

37 across strains than previously reported which also supports the split into two main clades of

38 this lineage. In contrast to previous results that ascribed all S. aureus subsp. anaerobius

39 isolates to a single, homogenous sequence type (ST1464) [1, 2], we found allelic variation in

40 two of the seven housekeeping genes (pta and tpi). The four samples from Sudan presented

41 an identical allelic profile, but it was different than the one deposited in the MLST database for

42 ST1464. Specifically, these four sequences (including the genome made available by Elbir and

43 collaborators [3]) carried the allele 502 in the yqiL gene instead of the allele 160 characteristic

44 of ST1464 in the MLST database. A sequence comparison showed that these two alleles differ

45 by one SNP. We found two tpi alleles in our sample set that were different in the Sudanese

46 and European clades (alleles 177 and 422 respectively) whereas three pta alleles were found

47 within the European clade. Particularly, a cluster of two Italian and one Danish samples was

48 ascribed to ST3756, previously represented in the MLST database by a Czech sheep isolate.

49 Pseudogenes found in S. aureus subsp. anaerobius

50 In the following paragraphs, we describe some of the most noteworthy pseudogenes found

51 according to their function, whose presence could explain the phenotypical peculiarities of

52 S. aureus subsp. anaerobius with respect to S. aureus subsp. aureus, especially regarding

53 virulence and metabolism.

54 Defence mechanisms. We found several pseudogenised oxidoreductases (Table S1 in

55 Additional File 2), which points to a deficiency in the protection from oxidative damage that

56 could explain the microaerophilic nature of S. aureus subsp. anaerobius. The catalase gene

57 was pseudogenised in all isolates, a feature of the S. aureus subsp. anaerobius genome

58 already reported [4] and one of the main differences between the two subspecies of S. aureus.

59 Catalase is regarded as a defensive mechanism against the oxygen radicals produced by

60 macrophages, and indeed the restoration of catalase activity increases resistance to H2O2 but

2

61 decreases the virulence in sheep, their natural hosts [5]. This suggests that the loss of catalase

62 activity plays an important role in host-adaptation. Other proteins involved in the evasion of

63 oxygen radicals such as sodA and sodM were, however, intact with a protein identity of 98%

64 and 99% with their homologs in S. aureus RF122, respectively.

65 Other pseudogenes related to defence mechanisms were transmembrane pumps (most

66 belonging to the ABC superfamily of transporters) involved in antibiotic resistance and fluoride

67 toxicity, and type I restriction enzymes.

68 Virulence factors. Several pseudogenes encoded proteins involved in the biosynthesis of

69 the capsular polysaccharide capsule. Although the S. aureus subsp. anaerobius genome

70 carries intact genes also involved in this process, the presence of these pseudogenes might

71 imply a deficient bacterial capsule.

72 Other pseudogenes encoding virulence factors were adherence factors (clumping factor B,

73 staphylococcal protein A), pore-inducing exoproteins (leukocidins lukD and lukE and γ- and α-

74 haemolysins), and the IgG-binding protein sbi. The coagulase gene was intact in all European

75 isolates but present as a pseudogene in the four Sudanese isolates. Contradictory results in

76 the literature have reported coagulase activity in Spanish and Sudanese isolates but not in

77 Kenyan or French ones [3, 6]. In addition, all S. aureus subsp. anaerobius isolates analysed

78 here carried an intact vwb gene in the SaaPIMVF7 which could influence the coagulation

79 phenotype.

80 Metabolism and energy production. Twenty-nine of the 164 pseudogenes present in all

81 isolates encoded enzymes involved in 12 amino acid metabolic pathways. The IMG annotation

82 tool revealed that, while the isolate RF122 is auxotrophic for the amino acids lysine,

83 phenylalanine, tyrosine, histidine and serine, Staphylococcus aureus subsp. anaerobius is

84 auxotrophic, in addition to those, for tryptophan, arginine and leucine. IMG also found intact

85 aerobic respiration pathways in S aureus subsp. anaerobius. This, together with the

86 compromised defence mechanisms discussed above, suggests that this bacterium is a

3

87 microaerophile due to a lack of response against oxidative stress rather than to an incapacity

88 of using aerobic respiration.

89 Fourteen and 6 pseudogenes encoded proteins involved in carbohydrate and lipid transport

90 and/or metabolism, respectively. Some of the pathways affected by the presence of

91 pseudogenes were metabolism of sugars (galactose, fructose, sucrose and mannose),

92 glycolysis/gluconeogenesis, pyruvate metabolism, and phosphotransferase system (PTS, a

93 major mechanism used for uptake of carbohydrates). One of the pseudogenes encoded the

94 acetyl-coenzyme A synthetase, which participates in many metabolic pathways.

95 Finally, enzymes involved in inorganic ion/coenzyme transport (specifically nickel, magnesium,

96 manganese, cobalt, iron and molybdenum) were also present as pseudogenes.

97 S. aureus subsp. anaerobius novel

98 . One novel prophage (ΦSaa1) belonging to the Siphoviridae family was found in all

99 isolates with a length 43.2 kb and a GC content of 33.4%, which encodes 74 proteins in most

100 cases (some isolates presented 73 due to the absence of a HNH endonuclease). The closest

101 prophage found by PHASTER based on nucleotide similarity (90.2%) was Φ2958PVL

102 (accession number NC_011344), a phage initially described in methicillin-resistant S. aureus

103 subsp. aureus isolates in Japan [7] and encoding the Panton Valentine leukocidin, which is

104 otherwise absent in ΦSaa1 (Fig. 1B).

105 We inferred gene-by-gene homology between ΦSaa1 and Φ2958PVL using Roary [8]. Based

106 on this comparison, 24 of the 74 genes in ΦSaa1 (32.4%) were split versions of 10 homologous

107 genes in Φ2958PVL, whereas 37 (50%) were intact with homologues in Φ2958PVL, and 13

108 ΦSaa1 genes (17.6%) were exclusive. Most of the latter genes belonged to the lysogeny

109 functional module. A phylogenetic tree constructed using FastTree v2.1.10 of integrase

110 sequences from representative S. aureus subsp. aureus phage lineages [9] revealed that the

111 ΦSaa1 integrase falls outside the major integrase groups (Fig. S1 in Additional File 1), with

4

112 Sa2int (the group to which the integrase of Φ2958PVL belongs) being its closest relative. The

113 nucleotide identity between the two integrases was 78%.

114 The only putative carried by ΦSaa1 was the virulence-associated protein E,

115 virE, whose gene was also present as a pseudogene.

116 . We also identified one novel, 13 kb-long Staphylococcal Pathogenicity

117 Island (SaPI) present in all isolates, which encoded 21 proteins including copies of the

118 Staphylococcal complement inhibitor (SCIN) and the von Willebrand factor-binding protein

119 (vWbp) (Fig. 1B). vWbp has previously been demonstrated to show host-specific activity, being

120 able to coagulate ruminant plasma [10, 11]. This new SaPI (SaaPIMVF7) is inserted in the

121 groES-groEL site –attB type V, same genomic location as others such as SaPIov2 (ED133,

122 ST133), and SaPIbov3 (RF122, ST151). We created a phylogenetic network using SplitsTree

123 v4.14.6 and the NeighborNet algorithm [12] of SaaPIMVF7 and previously described SaPIs

124 (Fig. S2 in Additional File 1). It showed that SaaPIMVF7 is closest to SaPIov2 (93.1%

125 nucleotide identity), firstly described in S. aureus subsp. aureus isolates of the clonal complex

126 CC133 (responsible for infections of small ruminants including sheep and goats) and that also

127 carries vWbp and SCIN (Fig. 1B). A comparison of the gene homology between SaaPIMVF7

128 and SaPIov2 revealed that 6 of the 21 SaaPIMV7 genes (28.6%) were split versions of 3 genes

129 found in SaPIov2, 11 (52.4%) were intact and 4 (19%) were exclusive to SaaPIMVF7.

130 Pseudogenised genes included the genes encoding integrase and primase, suggesting

131 SaaPIMVF7 is no longer mobile but stable in the chromosome, whereas the virulence factors

132 vWbp and SCIN were among the intact ones. The SaaPIMVF7-encoded vWbp and SCIN

133 proteins have 95.9% and 88.6% nucleotide identity, respectively, with the proteins encoded by

134 SaPIov2 in S. aureus subsp. aureus strain ED133. Of note, S. aureus subsp. anaerobius also

135 has chromosomal copies of vwb and scn of which the former is pseudogenised. The fact that

136 the SaPI-encoded genes involved in host adaptation (vwb and scn, respectively) were

137 transcribed (Fig. 1C) and apparently functional in SaaPIMVF7 (unlike most of the other genes,

138 such as int or pri, which contained mutations in their coding regions), strongly suggested a key

5

139 role in the pathogenicity of S. aureus subsp. anaerobius. To clearly confirm the functionality of

140 the vWbp protein, the SaaPIMVF7 vwb gene was cloned into the expression vector pCN51,

141 and the ability of the expressed protein to coagulate the ruminant plasma tested. In support of

142 our hypothesis, the SaaPIMVF7 vwb gene is functional (Fig. S3 in Additional File 1), confirming

143 its role in virulence.

144 References to Additional Information

145 1. Elbir H, Feil EJ, Drancourt M, Roux V, El Sanousi SM, Eshag M, et al. Ovine clone ST1464: a 146 predominant genotype of Staphylococcus aureus subsp. anaerobius isolated from sheep in Sudan. J 147 Infect Dev Ctries. 2010; 4(4):235-238. 148 2. de la Fuente R, Ballesteros C, Bautista V, Medina A, Orden JA, Domínguez-Bernal G, et al. 149 Staphylococcus aureus subsp. anaerobius isolates from different countries are clonal in nature. Vet 150 Microbiol. 2011; 150(1-2):198-202. 151 3. Elbir H, Robert C, Nguyen TT, Gimenez G, El Sanousi SM, Flock JI, et al. Staphylococcus aureus 152 subsp. anaerobius strain ST1464 genome sequence. Stand Genomic Sci. 2013; 9(1):1-13. 153 4. Sanz R, Marin I, Ruiz-Santa-Quiteria JA, Orden JA, Cid D, Diez RM, et al. Catalase deficiency in 154 Staphylococcus aureus subsp. anaerobius is associated with natural loss-of-function mutations within 155 the structural gene. Microbiology. 2000; 146 ( Pt 2)465-475. 156 5. de la Fuente R, Diez RM, Dominguez-Bernal G, Orden JA, Martinez-Pulgarin S. Restoring catalase 157 activity in Staphylococcus aureus subsp. anaerobius leads to loss of pathogenicity for lambs. Vet Res. 158 2010; 41(4):41. 159 6. de la Fuente R, Suarez G. Respiratory deficient Staphylococcus aureus as the aetiological agent of 160 "abscess disease". Zentralbl Veterinarmed B. 1985; 32(6):397-406. 161 7. Ma XX, Ito T, Kondo Y, Cho M, Yoshizawa Y, Kaneko J, et al. Two different Panton-Valentine leukocidin 162 phage lineages predominate in Japan. J Clin Microbiol. 2008; 46(10):3246-3258. 163 8. Page AJ, Cummins CA, Hunt M, Wong VK, Reuter S, Holden MT, et al. Roary: rapid large-scale 164 prokaryote pan genome analysis. Bioinformatics. 2015; 31(22):3691-3693. 165 9. Goerke C, Pantucek R, Holtfreter S, Schulte B, Zink M, Grumann D, et al. Diversity of prophages in 166 dominant Staphylococcus aureus clonal lineages. J Bacteriol. 2009; 191(11):3462-3468. 167 10. Guinane CM, Ben Zakour NL, Tormo-Mas MA, Weinert LA, Lowder BV, Cartwright RA, et al. 168 Evolutionary genomics of Staphylococcus aureus reveals insights into the origin and molecular basis of 169 ruminant host adaptation. Genome Biol Evol. 2010; 2454-466. 170 11. Viana D, Blanco J, Tormo-Mas MA, Selva L, Guinane CM, Baselga R, et al. Adaptation of 171 Staphylococcus aureus to ruminant and equine hosts involves SaPI-carried variants of von Willebrand 172 factor-binding protein. Mol Microbiol. 2010; 77(6):1583-1594. 173 12. Huson DH, Bryant D. Application of phylogenetic networks in evolutionary studies. Mol Biol Evol. 2006; 174 23(2):254-267.

175

6

Figure S1. Phylogenetic tree of the integrase gene from representative examples of S. aureus prophages. In bold and blue the integrase of the phage found in S. aureus subsp. anaerobius (ΦSaa1). Reference sequences are labelled indicating integrase major group, phage and accession number.

7

Figure S2. Phylogenetic network of representative examples of Staphylococcus aureus Pathogenicity Islands. In bold and blue the SaPI found in S. aureus subsp. anaerobius (SaaPIMVF7). The red and blue circles indicate those SaPIs that harbour the genes vwb and scn. Reference sequences are labelled indicating SaPI name, isolate and accession number (of the SaPI sequence when available, of the original genome otherwise).

8

Figure S3. Expression of SaaPIMVF7-encoded vwb. The SaaPIMVF7 vwb gene was cloned into the expression vector pCN51 under the control of a cadmium-inducible promoter, transformed into a coagulase and vWbp-deficient derivative of strain RN4220 (RN4220 coa::tetM Δvwb) and the ability of SaaPIMVF7 vWbp to coagulate ruminant plasma was assessed.

9

Figure S4. Schematic representation of IS loci selected for analysis. IS loci are shown for S. aureus subsp. anaerobius strain MVF84 and S. aureus strain RF122 representing the ancestral genomic context. (A-D) IS inserted at various distances from the downstream gene start codon. Note that in (A) IS insertion results in a 207 bp deletion in the intergenic region in strain MVF84 relative to strain RF122. (E-G) IS inserted downstream and in antisense orientation of target gene. (E&G) Locus in RF122 shows antisense orientation of downstream gene while in (F) downstream gene is in the same orientation as target gene for IS.

10

Figure S5. Presence of the insertion sequence (IS) does not affect expression of the downstream gene product through active transcription. Western blot analysis of the depicted expression constructs for assessing the impact of IS on the expression of (A) MgrA or (B) AdhA from the IS encoded promoter. 3x-FLAG-tagged protein-encoding genes containing or missing the IS were cloned into pCN47 and plasmids introduced into the S. aureus subsp. aureus strain RN4220 ∆spa for analysis. For a schematic of the locus in either S. aureus subsp. anaerobius MVF84 or S. aureus subsp. aureus RF122 refer to Figure S4.

11

ArtemisComparison Tool .

. aureus .

subsp

S.aureus

versus

.anaerobius

subsp

S.aureus

Pairwisegenome alignment of

. S6

Figure was(ACT) used bothto compare (MVF7 RF122, and respectively). Redand bluebarsindicate regions ofsimilarity samein the invertedand orientation, respectively.6 main The invertedchromosomal regions arehighlighted in and green numbered.

12