<<

bioRxiv preprint doi: https://doi.org/10.1101/415422; this version posted September 12, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Identification and capsular serotype sequetyping of pneumoniae strains

Lucia Gonzales-Silesa,b*, Francisco Salvà-Serraa,b,c,d, Anna Degermana, Rickard Nordéna, Magnus

Lindha, Susann Skovbjerga,b, Edward R. B. Moorea,b,d.

a Department of Infectious Diseases, Institute of Biomedicine, University of Gothenburg,

Gothenburg, Sweden

b Centre for Antibiotic Resistance Research (CARe), University of Gothenburg, Gothenburg,

Sweden

c Microbiology, Department of Biology, University of the Balearic Islands, Palma de Mallorca,

Spain

d Culture Collection University of Gothenburg (CCUG), Department of Clinical Microbiology,

Sahlgrenska University Hospital, Gothenburg, Sweden

* Corresponding author

E-mail address: [email protected] (LG)

Post address: Guldhedsgatan 10A 41346 Gothenburg, Sweden

bioRxiv preprint doi: https://doi.org/10.1101/415422; this version posted September 12, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

ABSTRACT

Correct identification of Streptococcus pneumoniae (pneumococcus) and differentiation from the closely

related species of the Mitis group of the genus Streptococcus, as well as serotype identification, is important

for monitoring disease epidemiology and assessing the impacts of pneumococcal vaccines. In this study, we

assessed the taxonomic identifications of 422 publicly available genome sequences of S. pneumoniae, S.

pseudopneumoniae and S. mitis, using different methods. Identification of S. pneumoniae, by comparative

analysis of the groEL partial sequence, was possible and accurate, whereas S. pseudopneumoniae and S.

mitis could be misclassified as S. pneumoniae, suggesting that groEL is unreliable as a biomarker for

differentiating S. pneumoniae from its closest related species. The genome sequences of S. pneumoniae and

S. pseudopneumoniae fulfilled the suggested thresholds of average nucleotide identity (ANI), i.e., >95%

genome sequence similarity to the sequence of respective type strains for identification of species, whereas

none of the S. mitis genome sequences fulfilled this criterion. However, ANI analyses of all sequences

versus all sequences allowed discrimination of the different species by clustering, with respect to species

type strains. The in silico DNA-DNA distance method was also inconclusive for identification of S. mitis

genome sequences, whereas presence of the “Xisco” gene proved to be a reliable biomarker for S.

pneumoniae identification. Furthermore, we present an improved sequetyping protocol including two

newly-designed internal sequencing primers with two PCRs, as well as an improved workflow for

differentiation of serogroup 6 types. The proposed sequetyping protocol generates a more specific product

by generating the whole gene PCR-product for sequencing, which increases the resolution for identification

of serotypes. Validations of both protocols were performed with publicly available S. pneumoniae genome

sequences, reference strains at the Culture Collection University of Gothenburg (CCUG), as well as with

clinical isolates. The results were compared with serotype identifications, using real-time Q-PCR analysis,

as well as the Quellung reaction or panel gel-precipitation. Our protocols provide a reliable

diagnostic tool for taxonomic identification as well as serotype identification of S. pneumoniae.

Keywords: Streptococcus pneumoniae; serotype; sequetyping; Mitis group bioRxiv preprint doi: https://doi.org/10.1101/415422; this version posted September 12, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

INTRODUCTION

Streptococcus pneumoniae (pneumococcus) causes invasive and non-invasive disease, including

pneumonia, meningitis, sepsis, otitis media, sinusitis, among others, particularly in children under

the age of 5 years and the aged (Johnson et al., 2010), leading to approximately a million deaths

annually in children aged less than 5 years, globally (Collaborators, 2017). A characteristic feature

and the main virulence factor of S. pneumoniae is the polysaccharide capsule that enables the

bacterium to evade host defence mechanisms (Nelson et al., 2007) and which is the basis for

epidemiological categorization of pneumococcal isolates and strains into serotypes and serogroups

(Geno et al., 2015). To date, 97 different capsular serotypes within 46 serogroups of S. pneumoniae

have been identified on the basis of the biochemical structure of the capsular polysaccharide (Geno

et al., 2015).

Several pneumococcal vaccines, which differ according to the polysaccharide capsule composition,

have been developed. The first pneumococcal conjugate vaccine (PCV), licensed in 2000, covered

7 serotypes (PCV7: 14, 6B, 19F, 23F, 4, 9V, 18C) (Hicks et al., 2007), followed by PCV10 (PCV7

serotypes plus serotypes 1, 5, and 7F) in 2009 (Esposito and Principi, 2015), PCV13 (PCV10

serotypes plus serotypes 3, 6A, and 19A) in 2010 (Geno et al., 2015). A 15-valent conjugate

vaccine is currently in clinical trials, and includes also serotypes 22F and 33F (LeBlanc et al.,

2017). The pneumococcal polysaccharide vaccine (PPSV23) protects against 23 different capsular

types (1, 2, 3, 4, 5, 6B, 7F, 8, 9N, 9V, 10A, 11A, 12F, 14, 15B, 17F, 18C, 19A, 19F, 20, 22F, 23F

and 33F), and covers a high percentage of the types found in pneumococcal bloodstream infections.

The vaccine is widely used for adults who are considered to be at high risk, as well as in children

older than 2 years and at increased risk for pneumococcal disease (Diao et al., 2016). The use of bioRxiv preprint doi: https://doi.org/10.1101/415422; this version posted September 12, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

the conjugate vaccines have significantly reduced the burden of pneumococcal disease in many

populations. However, since vaccine introduction, “serotype replacement” has been observed, with

increases in the proportions of invasive and non-invasive disease caused by pneumococcal

serotypes not covered by the vaccines (Hicks et al., 2007; Weinberger et al., 2011).

S. pneumoniae serotype 6C is an example of an opportunistic increase in an infectious

pneumococcus through serotype replacement. Serotype 6C was described as a newly-recognized

serotype in 2007 (Mavroidi et al., 2004) and appears to have been rare in pre-vaccination

populations. However, since the introduction of PCV7, the incidence of serotype 6C in disease and

carriage has increased in diverse populations, worldwide (Loman et al., 2013). PCV7 contains

polysaccharide from the 6B serotype capsule and PCV13 later included capsular polysaccharide of

serotype 6A, although current vaccines do not extend protection to serotype 6C, which likely has

promoted the observed serotype replacement (Park et al., 2008). Such serotype transitions

demonstrate the importance of maintaining surveillance programs and clinical protocols that are

able to respond to the evolutionary plasticity of infectious disease.

The classical serotyping method, the Quellung reaction, is based on the reaction of serotype-

specific antisera with the corresponding capsule (Neufeld F, 1910). This method is time-consuming

and costly, requiring live, cultivable , and a high degree of expertise, to the point that few

laboratories are able to carry out the analyses. During the last decade, the nucleotide sequences of

the capsule polysaccharide synthesis (CPS) loci (cps), harbouring the genes responsible for

synthesis of the pneumococcal polysaccharide capsule, have been determined for all known

serotypes. Accordingly, DNA amplification-based methods targeting specific capsular synthesis

genes that allow differentiation of the serotypes have been developed, i.e., sequential multiplex bioRxiv preprint doi: https://doi.org/10.1101/415422; this version posted September 12, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

PCR and sequential real-time Q-PCR (Pai et al., 2006; Varghese et al., 2017). Recently, a PCR-

amplification and DNA-sequence-based typing method, ‘sequetyping’, was described targeting the

regulatory gene, cpsB, with a single multiplex PCR, enabling the amplifications of 84 serotypes

and sequencing of PCR-products, differentiating 46 of the 93 serotypes recognized at that time

(Leung et al., 2012).

As an important practical step, before initiating pneumococcal serotype identification, it is critical

to confirm the identification of S. pneumoniae and differentiate it from the other species of the

Mitis group of the genus Streptococcus (Kawamura et al., 1995). The most closely-related species

of S. pneumoniae are S. pseudopneumoniae and S. mitis. Sequencing of the 16S rRNA genes

identifies a cytosine nucleotide at position 203 as a pneumococcal sequence signature, with an

adenosine residue in all other species of the Mitis group (Scholz et al., 2012). Partial sequence

determinations of individual metabolic ‘housekeeping’ genes, as a multi-locus sequence analysis

(MLSA) (Bishop et al., 2009), continue to be widely used for identifying strains at the species

level; for Streptococcus, groEL, gyrB, rpoB and sodA have been described as biomarker “house-

keeping” genes for identification of the species in the Streptococcus genus (Glazunova et al., 2009;

Hoshino et al., 2005; Kawamura et al., 1999; Teng et al., 2002). Additionally, the recently

described “Xisco” gene, which has been demonstrated to be a unique biomarker for S. pneumoniae,

provides a new approach for confirming specific differentiation between S. pneumoniae and its

close relatives of the Mitis group (Salvà-Serra et al., 2017). Genome-based methods, such as

average nucleotide identity (ANI) and in silico DNA-DNA hybridization, are gaining recognition

as robust measurements of relatedness between strains, with potential in confirming phylogenetic

and taxonomic relationships of bacterial identification (Konstantinidis and Tiedje, 2005; Meier-

Kolthoff et al., 2013). bioRxiv preprint doi: https://doi.org/10.1101/415422; this version posted September 12, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

In this study, we present an improved workflow for pneumococcal serotype identification,

including subtyping within serogroup 6 by sequetyping, as well as S. pneumoniae species

confirmation and differentiation from closely related streptococcal species.

METHODS

Bacterial strains

One-hundred thirty-eight pneumococcus strains with identified serotypes were obtained from the

Culture Collection University of Gothenburg, Gothenburg, Sweden (CCUG), where they were

maintained in lyophilized state for long-term storage. The serotypes of these strains were

determined at the Statens Serum Institut in Copenhagen, Denmark, by the Quellung reaction

(Slotved et al., 2016), or at the Public Health Agency of Sweden, using an antiserum panel gel-

precipitation protocol (Jauneikaite et al., 2015). Additionally, 50 strains, isolated from blood and

cerebrospinal fluid samples during 2013 and 2014 and identified as S. pneumoniae at the Clinical

Microbiological Laboratory, Sahlgrenska University Hospital, Gothenburg, Sweden, were

included in the study. The strains isolated from clinical samples were stored in freeze-drying

medium (Fry and Greaves, 1951), at -70 °C.

Genome sequence data

A local database was created, including all genome sequences of S. pneumoniae (n=328) that were

available in GenBank (Benson et al., 2017) on the 14th March 2015, plus the type strain S.

pneumoniae NCTC 7465T (GenBank accession number: LN831051) and all genome sequences

that were available in GenBank on the 18th May 2016 for 14 other species of the Mitis group

(n=248): S. pseudopneumoniae (n=40), S. mitis (n=53), S. australis (n=2), S. cristatus (n=16), S. bioRxiv preprint doi: https://doi.org/10.1101/415422; this version posted September 12, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

dentisani (n=2), S. gordonii (n=22), S. infantis (n=7), S. massiliensis (n=2), S. oralis (n=34), S.

parasanguinis (n=29), S. peroris (n=1), S. sanguinis (n=33), S. sinensis (n=1) and S. tigurinus

(n=6) (Jensen et al., 2016).

DNA extraction

The stored strains from CCUG or clinical samples were inoculated onto Blood Agar plates with

horse blood 5% (prepared at the Substrate Department, Clinical Microbiological Laboratory,

Sahlgrenska University Hospital), and incubated overnight at 36 °C with 5% CO2. DNA was

extracted, using a ‘heat-shock’ protocol (Welinder-Olsson et al., 2000). Briefly, an inoculating

loop-full of bacterial biomass was suspended and incubated in 100 μL Tris-EDTA buffer and 15

μL lysostaphin 0.05 μM (Sigma-Aldrich, St. Louis, MO, USA) at 37 °C for 1 hour. Subsequently,

10 μL of Proteinase K (Sigma-Aldrich, St. Louis, MO, USA) were added and the suspensions were

incubated for 2 hours at 56 °C. Finally, the samples were incubated at 95 °C for 10 minutes. After

incubation, samples were centrifuged at 17,900 x g, for 10 min. The supernatant containing

genomic DNA, was transferred to a new tube and stored at -20 °C.

For multiplex PCR analyses, bacterial DNA was extracted, using a MagNA Pure LC instrument

(Roche Diagnostics, Mannheim, Germany) and a Total Nucleic Acid Isolation kit (Roche

Diagnostics, Mannheim, Germany). The extracted DNA was eluted in 100 µl of elution buffer, and

stored at -20 °C, until real-time multiplex PCR-assays were performed.

Taxonomic identifications

Identifications of reference strains and strains isolated from clinical samples were determined by

PCR-amplification and sequence analysis of partial (757 bp) groEL gene, using primers, bioRxiv preprint doi: https://doi.org/10.1101/415422; this version posted September 12, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

StreptogroELd and StreptogroELr, as previously described (Glazunova et al., 2009). PCR-products

were purified and sequenced (GATC Biotech AG, Constance, Germany). The sequences were

compared with the groEL partial sequences of the type strains of the 20 validly published species

of the Mitis group of the genus Streptococcus, using BioNumerics software platform, version 7.5

(Applied Maths, Sint-Martens-Latem, Belgium). A strain was assigned to a given species if the

partial groEL sequence similarity value was above 96%. The strains were also analysed for

presence of the “Xisco” gene, using amplification-primers, Spne-CW-F2 and Spne-CW-R,

according to Salvà-Serra et al. (2017).

The taxonomic status of the 422 genome sequences of S. pneumoniae, S. pseudopneumoniae and

S. mitis included in the local data base were assessed by determining average nucleotide identity,

based on BLAST (ANIb) (Goris et al., 2007), using JSpeciesWS (Richter et al., 2016), against the

reference genome sequences of the type strains of the different species. Additionally, the matrix

obtained from ANIb similarities of all vs. all genome sequences was used to construct an ANIb-

based dendrogram, according to Gomila et al. (2015). Briefly, the matrix of ANIb values was used,

applying Pearson’s distance correlation and an average linkage construction (UPGMA hierarchical

clustering), using PermutMatrix software (Caraux and Pinloche, 2005). Finally, in silico DNA-

DNA distance values were calculated, using the Genome-to-Genome Distance Calculator (GGDC),

(ggdc.dsmz.de) (Meier-Kolthoff, 2013) and the recommended BLAST+ method. The GGDC

results shown are based on the recommended formula 2 (sum of all identities found in high-scoring

segment pairs (HSPs), divided by the overall HSP length), which is independent of genome size

and is, thus, robust when using draft genomes.

bioRxiv preprint doi: https://doi.org/10.1101/415422; this version posted September 12, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Modified Sequetyping protocol

The sequetyping protocol was based on analysis of the capsule polysaccharide synthesis cpsB

region (Leung et al., 2012). In order to obtain sufficient quality for the entire 1,061 bp segment,

two internal primers were designed, wzh-mid-F and wzh-mid1-R, generating two partly

overlapping sequences (Figure 1). The reaction mixture for PCR-assays comprised 0.1 to 10 ng of

DNA template, 1X Taq PCRMasterMix (Qiagen, Hilden, Germany), 1 μM concentration of each

amplification-primer, in a total volume of 25 μL. Primer sequences are listed in Supplementary

Table 1. PCR-amplification was achieved, with an initial cycle of 5 min denaturation at 95°C and

30 cycles of 30 s at 95°C for denaturation, 30 s at 55°C for primer-annealing and 90 s at 72°C for

primer-extension, with a final extension step at 72°C for 5 min. Amplicons were analysed by

electrophoresis in 1% agarose gel. Sequencing reactions were performed using the four primers

(Figure 1).

A database of reference sequences was created for comparative analyses, including the sequences

for each serotype listed by Bentley et al., (2006), as well as the complete cpsB region sequences

extracted from the 329 S. pneumoniae genome sequences. The PCR-amplicon nucleotide

sequences were analysed by similarity analysis, using BioNumerics software platform, version 7.5

(Applied Maths, Sint-Martens-Latem, Belgium). A strain was assigned to a given serotype if the

similarity value was higher than 99% and the similarity of the second highest match was, at least,

1% lower. If the similarity value was shared between two or more serotypes, it was reported as

multiple-matched serotypes.

BLASTN analyses of the cpsB region were also performed, with respect to the 248 non-

pneumococcal genome sequences, in order to determine if this region is present in other species of bioRxiv preprint doi: https://doi.org/10.1101/415422; this version posted September 12, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

the Mitis group. Only BLAST hits with an E-Value lower then 10e-5 were considered significant.

Additionally, the four sequetyping primers were also analysed, using BLASTN against the 248

non-pneumoniae genome sequences. Only matches covering the entire primer length (100%

coverage) and maximum 2 mismatches were considered to be positive.

Serotype identification by multiplex real-time PCR.

A multiplex real-time PCR, able to detect 40 different serotypes, was developed and applied. The

assay is based on a protocol published by Centers for Disease Control and Prevention (CDC) (da

Gloria Carvalho et al., 2010), and is similar to the real-time PCR subsequently developed by CDC

(Pimenta et al., 2013). A complete list of primers and probe sequences can be found in

Supplementary Table 2. The multiplex real-time PCR was performed in 384-well format in a Quant

Studio 6 Flex (Applied Biosystems, Carlsbad, CA). Each PCR consisted of a 20 µl reaction volume,

including 4 μl of template DNA, along with 1 µM of each of the forward and reverse primers, 0.85

µM of the probe, 10 µl of 2X Universal Master Mix for DNA targets (Applied Biosystems, Foster

City, CA, USA) and RNAase-free water. The Tecan Freedom EVO PCR setup workstation (Life

Sciences, Männedorf, Switzerland) was used to prepare the PCR assays in a 384-well plate. The

reaction conditions were as follows: one initial cycle at 46°C for two minutes, followed by

denaturation at 95°C for 10 minutes and 45 amplification cycles of 95°C for 15 s and 58°C for one

minute. Each multiplex performance was evaluated, using an internal control (cpsA) to verify the

presence of pneumococcal DNA in the sample, as well as two pUC57 plasmids containing each

PCR target amplicon for all serotype systems.

bioRxiv preprint doi: https://doi.org/10.1101/415422; this version posted September 12, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

RESULTS

Classification of database sequences

In order to re-evaluate the classifications of the genome sequences of S. pneumoniae, S.

pseudopneumoniae and S. mitis in the genome sequence database, ANIb analyses were performed,

wherein each genome sequence was compared to that of the type strain of each species of the Mitis

group and by comparisons of all genomes to all. The analyses showed that all 328 strains (type

strain excluded) listed as S. pneumoniae in the GenBank database were correctly identified as S.

pneumoniae, i.e., had similarity values greater than 95% (Rosselló-Móra and Amann, 2015). Of

them, 24 sequences exhibited ANIb similarity values ≥99% to the sequence of the type strain, 271

strains exhibited ≥98% similarity and 33 strains exhibited ≥97% similarity. By comparison, only 9

of the 39 sequences from strains listed as S. pseudopneumoniae (type strain excluded) in GenBank

exhibited ANIb values ≥95%, whereas 48 of the 52 sequences from strains listed as S. mitis (type

strain excluded) exhibited ANIb values below 95% and only four strains had ANIb values between

95-96%, indicating a significant number of misclassifications of strains for which genome

sequence data had been submitted to GenBank.

Additionally, cluster analyses was done, using the calculated ANIb similarity values for all strains

against all, for S. pseudopneumoniae and S. mitis, including, as well, the type strains of the other

13 species of the Mitis group included in the genomes database. With this analysis, a dendrogram

was generated, to visualize the relationships among the strains, with respect to the type strains of

the different species (Supplementary Figure 1). In the cases of S. pneumoniae, all genome

sequences clustered most closely with the type strain of S. pneumoniae, confirming the taxonomic

designations for the genome sequences. However, only nine of the 39 genome sequences listed as

S. pseudopneumoniae and thirty-six of the 52 genome sequences listed as S. mitis in the database bioRxiv preprint doi: https://doi.org/10.1101/415422; this version posted September 12, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

clustered in proximity to the type strain of the respective species and were, therefore, taxonomically

designated as S. pseudopneumoniae and S. mitis, while the remaining 46 strains clustered closer to

other species.

The GGDC analyses, comprising comparison with type strains for each species, showed that all S.

pneumoniae genome sequences had in silico hybridization values higher than 70%, confirming

their taxonomic identities. The GGDC analyses for S. pseudopneumoniae matched the results

obtained by ANIb analysis (Table 1), whereas only one of the genome sequences of S. mitis

exhibited a hybridization value higher than 70%; for the rest of the genome sequences, the in silico

DNA-DNA hybridization values were lower than 70% and inconclusive for confirming species-

level identifications (Table 2).

Partial groEL sequence analyses using the region of the gene suggested by Glazunova et al., (2009)

were also performed. The 757 bp groEL sequence was extracted from all genome sequences of S.

pneumoniae, S. mitis and S. pseudopneumoniae and similarity values of the sequences were

calculated, with respect to the type strains of the 14 species of the Mitis group of Streptococcus.

By this analysis of partial groEL sequences, with sequence similarities above 96% (cut-off value)

with the type strains of the respective species, all 328 genome sequences listed as representing S.

pneumoniae genomes (type strain excluded) were identified as S. pneumoniae, whereas only three

of 39 genomes listed as S. pseudopneumoniae (type strain excluded) were identified as S.

pseudopneumoniae (Table 1), and 28 of the 52 sequences listed as S. mitis (type strain excluded)

were identified as S. mitis. In four of the strains, the groEL gene could not be found in the genome

sequence (Table 2). The classifications of the remaining 17 genome sequences were ambiguous,

with non-definitive similarity values for S. mitis, as well as S. pneumoniae and S. bioRxiv preprint doi: https://doi.org/10.1101/415422; this version posted September 12, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

pseudopneumoniae, allowing no clear species-level identifications (Table 2). Finally, the “Xisco”

gene was not detected in any genome sequence listed as S. mitis or S. pseudopneumoniae, whereas

the “Xisco” gene was present in all genome sequences identified as S. pneumoniae. A summary of

the results for the genome sequences that were taxonomically incorrect but listed in GenBank as S.

pseudopneumoniae and S. mitis are presented in Supplementary Table 3.

Based on these results, discrepancies were observed when comparing the results of identifications

of genome sequences obtained by genome sequence ANIb analysis and results obtained by partial

groEL sequencing, suggesting that groEL may not be as reliable a marker as anticipated for

identification of the closely related species of the Mitis group of the Streptococcus.

Identification of Streptococcus pneumoniae in culture collections and clinical samples

In cultivated and isolated clinical strains (n=50) as well as in the 138 strains from the CCUG

previously identified as S. pneumoniae, the “Xisco” gene was present in 100% of strains.

Furthermore, groEL similarity values were observed to be greater than 98% in all strains and

greater than 99% in two-thirds of the analysed strains, confirming by two independent techniques

that the strains were correctly identified as S. pneumoniae.

Serotype identification

The sequetyping technique of Leung et al. (2012) was modified by using two internal primers to

generate two partially overlapping amplicons, representing the whole 1,017 bp cpsB-region (Figure

1). To assess its accuracy, sequetyping was evaluated in silico by analysing cpsB sequences that

were extracted from the 329 genome sequences of S. pneumoniae in the local genome sequence

database. The serotypes of 261 (80%) of these genomes were identified, with similarity values bioRxiv preprint doi: https://doi.org/10.1101/415422; this version posted September 12, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

greater than 99% to a reference sequence. In 15 strains similarity matches were low and therefore

they could not be assigned to the serotype, 8 strains exhibited best matches to serotypes 10B-10C

(less than 97%) and 7 strains exhibited best matches to serotype 24F (less than 98%). The serotypes

of 52 genomes could not be determined. Thirty-six of these genomes were previously described as

‘non-typeable’ pneumococci (Hathaway et al., 2004) and, therefore, the cpsB sequence was not

present (Table 3). For the remaining 16 genomes, the serotypes could not be determined, due to

low similarity values, with respect to the reference sequences.

BLASTN analyses of the cpsB region, with respect to the 248 non-pneumoniae genome sequences,

gave 22 positive hits with E-Values lower than 10e-5 but with similarities ranging between 93 and

80% (Supplementary Table 4), suggesting that the cpsB region could also be present in other

species of the Mitis group. Analysis of the probability for the four sequetyping primers to amplify

among the 248 non-pneumoniae genome sequences considering a maximum of 2 mismatches

showed that the PCR including the primers cps1 - wzh-mid-R could lead to 2 positive

amplifications, whereas the PCR reaction including the primers wzh-mid-F – cps2 could lead to 13

positive amplifications. However, the whole cpsB region will be expected to be amplified in only

two cases, in S. mitis SK579 and S. mitis SK616 (Supplementary Table 4).

The sequetyping was applied on the 138 S. pneumoniae strains from the CCUG, for which the

serotypes had previously been determined by the Quellung reaction or the antiserum panel gel-

precipitation protocol. The determined sequences were analysed by BLAST searches, and

similarities were recorded. A sequence was assigned to a specific serotype if the similarity value

was greater than 99% and the next-best similarity match was, at least, 1% lower. In 140 strains

(97%), the serotype by sequetyping matched the results obtained by the reference methods. bioRxiv preprint doi: https://doi.org/10.1101/415422; this version posted September 12, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Discrepancies were observed for five strains: CCUG 1749 (17A) was identified as 10A;

CCUG 5906 (36) was identified as 15B; CCUG 20653 (48) was identified as 6B; CCUG 27692

(15A) was identified as 19B; and CCUG 55117 (16A) was identified as 48.

Finally, the serotypes of 50 strains isolated from clinical samples were determined, using

sequetyping, multiplex real-time PCR and antiserum panel gel-precipitation protocol performed at

the Public Health Agency of Sweden. A serotype was identified by real-time PCR for 36 of the 50

strains (73%), whereas the serotypes were identified only by sequetyping for the remaining 14

strains (27%), showing serotypes that were not targeted by the real-time PCR assay. In all cases,

the obtained results agreed with those obtained by the antiserum panel protocol (Table 4).

Serogroup 6 differentiation

A dendrogram, based on the sequence of the entire cpsB region (1,017 bp) sequence from all strains

classified as serogroup 6, was created (Supplementary Figure 2). The sequences did not form

distinct clusters, indicating that serotype differentiation among serogroup 6 is not possible by

sequetyping of this region and that an alternative method is needed. A DNA sequencing-dependent

approach was used for differentiating the serotypes 6A, 6B, 6C and 6D; a schema of the suggested

protocol is shown in Figure 2. Firstly, a PCR-amplification, using the primers, wciP374F (this

study) and wciP-R (Jin et al., 2009), was performed and the PCR-product was sequenced, using

primer, wciP374F. Primer sequences are listed in Supplementary Table 1. The sequence product

allowed visualization of the single nucleotide polymorphism (SNP) that distinguishes serotype

6A/6C (guanine in position 584) and serotype 6B/6D (adenine in position 584). Subsequently, for

differentiating serotype 6A and 6C, a second PCR, using primers, Del6Cwzy_Fv2 (this study) and

Del6Cwzy_R (Jin et al., 2009), followed by Sanger sequencing, using primer Del6Cwzy_Fv2, was bioRxiv preprint doi: https://doi.org/10.1101/415422; this version posted September 12, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

performed. A 6 bp deletion in the gene wzy, characteristic for serotype 6C, was detected. The in

silico PCR analysis showed that it is possible to obtain PCR-products for serotypes 6A, 6B and 6C

but not for serotype 6D (Table 5), although, the deletion was present only in serotype 6C. For

differentiating serotype 6B from 6D, a PCR, using primers, wciN_6AB_F and wciN_6AB_R (this

study), was performed. This PCR is shown to be unique for serotype 6B; thus, if the PCR-product

was produced, the strain was assigned to serotype 6B, whereas, if the PCR was negative, the strain

was assigned to serotype 6D. To finally confirm serotype 6D, two additional PCR-assays, targeting

the wciNbeta region were performed, the first PCR, using primers, wciNbetaS1/ wciNbetaA2 (Jin et al.,

2009), and the second, using primers, wciNbetaS2/ wciNbetaA1 (Jin et al., 2009). If at least one of

the PCR-assays was positive, the strain was confirmed as serotype 6D. Details of the analysis are

presented in Table 5.

The sequence analyses performed for the target regions of the genomes showed that the regions

where the primers anneal are highly conserved; thus, PCR-amplification is expected to be specific

and reliable. The proposed protocol was tested in all CCUG strains identified as serogroup 6 by

Quellung reaction and in the clinical isolates identified as serogroup 6. Similar results were

obtained in 9 strains when the proposed protocol was tested, compared to the results obtained by

Quellung reaction, except for CCUG 3114, which was previously described as 6A and reclassified

as serogroup 6C.

DISCUSSION

Correct identifications of S. pneumoniae strains are crucial for choosing the proper treatment

options and for assessment of the burden of disease. As a general standard, routine culture-based

identification of S. pneumoniae consists of bile solubility and optochin susceptibility tests (Richter bioRxiv preprint doi: https://doi.org/10.1101/415422; this version posted September 12, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

et al., 2008). There is a percentage of isolates that give inconsistent results with optochin

susceptibility and bile solubility and are referred to as ‘atypical’ pneumococci. In addition, similar

biochemical properties are also present in a significant proportion of other closely related species

of the Mitis group, such as S. mitis, and S. pseudopneumoniae, especially when samples are from

respiratory sites (Keith et al., 2006; Rolo et al., 2013). The in silico analysis performed in this study

showed that identification of S. pneumoniae by analysis of the groEL partial sequence was possible

and reliable, whereas S. pseudopneumoniae and S. mitis could be misclassified as S. pneumoniae,

suggesting that groEL is an unreliable marker for differentiating S. pneumoniae from its closest

related species. In the studies of Glazunova et al. (2009) and Teng et al. (2002) (Glazunova et al.,

2009; Teng et al., 2002), where groEL was proposed to differentiate S. pneumoniae from other

species of the Mitis group, few strains of each species were used for the analysis; in our study, we

included all the genomes sequences available in the database at the time the study was performed.

These results point to the risk that partial gene sequence analysis may lead to misclassification, for

example, due to horizontal transfer. Horizontal gene transfer and homologous recombination,

involving groEL, between species most likely occurs, as has been suggested previously for sodA

and rpoB genes (Varghese et al., 2017).

High degrees of horizontal gene transfer and homologous recombination (Chi et al., 2007; Jensen

et al., 2016; Kilian et al., 2008) between S. pneumoniae and commensal viridans group streptococci

have given rise to genotypic ambiguities between S. pneumoniae and closely related species, such

as S. mitis, S. pseudopneumoniae and S. oralis (Kilian et al., 2008; Kilian et al., 2014; Whatmore

et al., 2000). Multi-Locus Sequence Analysis (MLSA) for the Viridans group streptococci

developed by Bishop et al. (2009) (Bishop et al., 2009) and core genome phylogenetic analyses

(Jensen et al., 2016) are genome-based techniques that can differentiate the Viridans group bioRxiv preprint doi: https://doi.org/10.1101/415422; this version posted September 12, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

streptococci to the species level. ANIb similarity determination is gaining relevance as a robust

measure of relatedness between strains, with potential in confirming phylogenetic and taxonomic

relationships of bacterial identification (Konstantinidis and Tiedje, 2005). An ANIb similarity

threshold above 95%, with respect to species reference type strains is proposed to provide species-

level identifications of given genomes (Kim et al., 2014; Richter and Rossello-Mora, 2009). In our

study, the genome sequences of S. pneumoniae and S. pseudopneumoniae fulfilled the suggested

thresholds, whereas only four of the S. mitis genome sequences fulfilled this criterion. However,

cluster analyses, derived from determined all vs. all genome sequences ANIb similarity values,

allowed discrimination of the different species, by clustering with respect to species type strains.

The in silico DNA-DNA hybridization calculated with the GGDC was also inconclusive for

identification of S. mitis genome sequences.

The recently described “Xisco” gene, which is detected by a single PCR, seems to be a good marker

for the correct identification of S. pneumoniae and differentiation from the closely related species

S. pseudopneumoniae and S. mitis (Salvà-Serra et al., 2017). Both in the in silico and in vitro

analyses, the “Xisco” gene was present in all S. pneumoniae strains and absent in all genomes and

strains of the non-pneumococcus Mitis group species. Other targets have been proposed to be

specific for pneumococci the last decade, such as pneumolysin (ply) (McAvin et al., 2001),

autolysin (lytA) (Corless et al., 2001), pneumococcal surface A (psaA) (Morrison et al.,

2000), and penicillin binding protein (pbp) (O'Neill et al., 1999), among others. However, the

“Xisco” gene seems to be more robust and distinguishes S. pneumoniae from the other species of

the Mitis group more reliably. Since recombination in the Mitis group may occur, it is potentially

unreliable to use a single gene biomarker for identification of S. pneumoniae.

bioRxiv preprint doi: https://doi.org/10.1101/415422; this version posted September 12, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Identification of S. pneumoniae serogroup/serotype is important for surveillance of strains in

disease carriage and for strategies in vaccine development (O'Brien et al., 2009). For serotyping S.

pneumoniae, the Quellung reaction method is considered the ‘gold standard’, although it can be

performed only on viable isolates, needs expertise and is expensive. Recently, molecular

techniques, such as genotypic typing methods targeting serotype-specific regions of the cps genes,

including multiplex PCR (Brito et al., 2003; da Gloria Carvalho et al., 2010; Jourdain et al., 2011;

Pai et al., 2006; Richter et al., 2013) and multiplex real-time Q-PCR (Pimenta et al., 2013), have

been described. These methods allow the detection of multiple serotypes but are still relatively

laborious, considering than more that nearly 100 different serotypes are known today. Most of these

methods were designed to be able to identify the serotypes that have been included in vaccines or

which are most common in given geographic areas. However, in surveillance studies, replacement

of vaccine serotypes by non-vaccine serotypes has been reported in regions where pneumococcal

conjugate vaccines are implemented (Hicks et al., 2007; Weinberger et al., 2011), raising the

necessity for simplified methods that allow detection of as many serotypes as possible, as well as

recognition of newly-evolved serotypes,.

The recently described sequetyping technique by Leung et al. (2012), has the advantage of being

able to detect a broad range of serotypes in one analysis. However, in our hands it was difficult to

obtain adequate amplicons and, as pointed out by Leung et al., the size of the amplicon (1,061 bp)

is too large for the current Sanger sequencing protocols. Therefore, we added two internal primers

to amplify two fragments and sequence, enabling to obtain the whole cpsB sequence with good

quality. This strategy allows distinction between the serotypes 18B and 18C, but not differentiation

within the serogroups 6 (6A, 6B, 6C, and 6D) and 7 (7F and 7A). An advantage of sequetyping is

that it can be based on data from the publicly available GenBank database, although, the nature of bioRxiv preprint doi: https://doi.org/10.1101/415422; this version posted September 12, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

handling of data deposited with this database implies potential risk for incorrect assignment of

serotype designations, as well as incorrect taxonomic classifications of strains. We confirmed that

cpsB sequetyping gave correct serotype results by performing an in silico sequetyping of 329

available S. pneumoniae genomes and also a BLAST similarity analysis to further document the

accuracy of the sequetyping method.

The utility of DNA-based methods for serotyping can be limited, due to inherent difficulties of

differentiations within serogroups, which is of importance because available vaccines may include

some, but not all, serotypes of a serogroup. Recently, PCR-based protocols for improved

discrimination of serogroup 18 and serotypes 22F and 33F were described (Gillis et al., 2017;

Tanmoy et al., 2016). Here we present a modified protocol for discriminating serogroup 6

serotypes, based on sequence analysis., The distinction is important, given the significant increase

of pneumococcal infections of serotype 6C after introduction of conjugate vaccines.

The sequetyping was applied on 50 pneumococcal clinical isolates, which were also analysed by

real-time Q-PCR and serotyping by an antiserum panel at the Public Health Agency of Sweden.

The comparisons showed good agreement between the assays, similar to what was observed by

Dube et al. (2015) (Dube et al., 2015). The results confirmed that sequetyping was able to detect

also several non-vaccine serotypes. These genotypes were not detected by real-time Q-PCR

because this assay identifies only those serotypes that are specifically targeted. However, the use

of the sequetyping method is limited to single isolates due to difficulties to differentiate different

serotypes when analysing the sequence chromatograms. In contrast, the real-time Q-PCR can be

used with total DNA extracts from samples and is therefore, able to recognize the presence of

multiple serotypes in a given sample. bioRxiv preprint doi: https://doi.org/10.1101/415422; this version posted September 12, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

In conclusion, the presence of the “Xisco” gene, genome sequence ANIb, in silico DNA-DNA

hybridization and targeted groEL comparative sequence analyses are reliable methods for

identification of pneumococci. Serotyping by using PCR- and DNA sequence-based methods is

highly useful in cases where access to traditional methods is limited and when cultivation of

isolates is negative. However, since S. pneumoniae and the related species of the Mitis group of

Streptococcus undergo constant recombination, the use of the different techniques needs to be

applied in order to verify the reliability of analyses.

ACKNOWLEDGMENTS

This work was supported by the European Commission: TAILORED-Treatment (project number

602860; www.tailored-treatment.eu). The Culture Collection University of Gothenburg (CCUG)

is supported by the Department of Clinical Microbiology, Sahlgrenska University Hospital and the

Sahlgrenska Academy of the University of Gothenburg. FS-S was supported by stipends for Basic

and Advanced Research from the CCUG, through the Institute of Biomedicine, Sahlgrenska

Academy, University of Gothenburg.

Declarations of interest: none.

REFERENCES

Benson, D.A., Cavanaugh, M., Clark, K., Karsch-Mizrachi, I., Ostell, J., Pruitt, K.D., Sayers, E.W.,

2017. GenBank. Nucleic acids research.

bioRxiv preprint doi: https://doi.org/10.1101/415422; this version posted September 12, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bentley, S.D., Aanensen, D.M., Mavroidi, A., Saunders, D., Rabbinowitsch, E., Collins, M.,

Donohoe, K., Harris, D., Murphy, L., Quail, M.A., Samuel, G., Skovsted, I.C., Kaltoft, M.S.,

Barrell, B., Reeves, P.R., Parkhill, J., Spratt, B.G., 2006. Genetic analysis of the capsular

biosynthetic locus from all 90 pneumococcal serotypes. PLoS genetics 2, e31.

Bishop, C.J., Aanensen, D.M., Jordan, G.E., Kilian, M., Hanage, W.P., Spratt, B.G., 2009.

Assigning strains to bacterial species via the internet. BMC biology 7, 3.

Brito, D.A., Ramirez, M., de Lencastre, H., 2003. Serotyping Streptococcus pneumoniae by

multiplex PCR. Journal of clinical microbiology 41, 2378-2384.

Caraux, G., Pinloche, S., 2005. PermutMatrix: a graphical environment to arrange gene expression

profiles in optimal linear order. Bioinformatics 21, 1280-1281.

Chi, F., Nolte, O., Bergmann, C., Ip, M., Hakenbeck, R., 2007. Crossing the barrier: evolution and

spread of a major class of mosaic pbp2x in Streptococcus pneumoniae, S. mitis and S. oralis.

International journal of medical microbiology : IJMM 297, 503-512.

Collaborators, G.B.D.D.D., 2017. Estimates of global, regional, and national morbidity, mortality,

and aetiologies of diarrhoeal diseases: a systematic analysis for the Global Burden of Disease Study

2015. Lancet Infect Dis 17, 909-948.

Corless, C.E., Guiver, M., Borrow, R., Edwards-Jones, V., Fox, A.J., Kaczmarski, E.B., 2001.

Simultaneous detection of Neisseria meningitidis, Haemophilus influenzae, and Streptococcus bioRxiv preprint doi: https://doi.org/10.1101/415422; this version posted September 12, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

pneumoniae in suspected cases of meningitis and septicemia using real-time PCR. Journal of

clinical microbiology 39, 1553-1558.

da Gloria Carvalho, M., Pimenta, F.C., Jackson, D., Roundtree, A., Ahmad, Y., Millar, E.V.,

O'Brien, K.L., Whitney, C.G., Cohen, A.L., Beall, B.W., 2010. Revisiting pneumococcal carriage

by use of broth enrichment and PCR techniques for enhanced detection of carriage and serotypes.

Journal of clinical microbiology 48, 1611-1618.

Diao, W.Q., Shen, N., Yu, P.X., Liu, B.B., He, B., 2016. Efficacy of 23-valent pneumococcal

polysaccharide vaccine in preventing community-acquired pneumonia among immunocompetent

adults: A systematic review and meta-analysis of randomized trials. Vaccine 34, 1496-1503.

Dube, F.S., van Mens, S.P., Robberts, L., Wolter, N., Nicol, P., Mafofo, J., Africa, S., Zar, H.J.,

Nicol, M.P., 2015. Comparison of a Real-Time Multiplex PCR and Sequetyping Assay for

Pneumococcal Serotyping. PloS one 10, e0137349.

Esposito, S., Principi, N., 2015. Impacts of the 13-Valent Pneumococcal Conjugate Vaccine in

Children. J Immunol Res 2015, 591580.

Fry, R.M., Greaves, R.I., 1951. The survival of bacteria during and after drying. J Hyg (Lond) 49,

220-246.

bioRxiv preprint doi: https://doi.org/10.1101/415422; this version posted September 12, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Geno, K.A., Gilbert, G.L., Song, J.Y., Skovsted, I.C., Klugman, K.P., Jones, C., Konradsen, H.B.,

Nahm, M.H., 2015. Pneumococcal Capsules and Their Types: Past, Present, and Future. Clin

Microbiol Rev 28, 871-899.

Gillis, H.D., Demczuk, W.H.B., Griffith, A., Martin, I., Warhuus, M., Lang, A.L.S., ElSherif, M.,

McNeil, S.A., LeBlanc, J.J., 2017. PCR-based discrimination of emerging Streptococcus

pneumoniae serotypes 22F and 33F. J Microbiol Methods 144, 99-106.

Glazunova, O.O., Raoult, D., Roux, V., 2009. Partial sequence comparison of the rpoB, sodA,

groEL and gyrB genes within the genus Streptococcus. International journal of systematic and

evolutionary microbiology 59, 2317-2322.

Gomila, M., Pena, A., Mulet, M., Lalucat, J., Garcia-Valdes, E., 2015. Phylogenomics and

systematics in Pseudomonas. Front Microbiol 6, 214.

Goris, J., Konstantinidis, K.T., Klappenbach, J.A., Coenye, T., Vandamme, P., Tiedje, J.M., 2007.

DNA-DNA hybridization values and their relationship to whole-genome sequence similarities.

International journal of systematic and evolutionary microbiology 57, 81-91.

Hathaway, L.J., Stutzmann Meier, P., Battig, P., Aebi, S., Muhlemann, K., 2004. A homologue of

aliB is found in the capsule region of nonencapsulated Streptococcus pneumoniae. Journal of

bacteriology 186, 3721-3729.

bioRxiv preprint doi: https://doi.org/10.1101/415422; this version posted September 12, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Hicks, L.A., Harrison, L.H., Flannery, B., Hadler, J.L., Schaffner, W., Craig, A.S., Jackson, D.,

Thomas, A., Beall, B., Lynfield, R., Reingold, A., Farley, M.M., Whitney, C.G., 2007. Incidence

of pneumococcal disease due to non-pneumococcal conjugate vaccine (PCV7) serotypes in the

United States during the era of widespread PCV7 vaccination, 1998-2004. The Journal of infectious

diseases 196, 1346-1354.

Hoshino, T., Fujiwara, T., Kilian, M., 2005. Use of phylogenetic and phenotypic analyses to

identify nonhemolytic streptococci isolated from bacteremic patients. Journal of clinical

microbiology 43, 6073-6085.

Jauneikaite, E., Tocheva, A.S., Jefferies, J.M., Gladstone, R.A., Faust, S.N., Christodoulides, M.,

Hibberd, M.L., Clarke, S.C., 2015. Current methods for capsular typing of Streptococcus

pneumoniae. J Microbiol Methods 113, 41-49.

Jensen, A., Scholz, C.F., Kilian, M., 2016. Re-evaluation of the taxonomy of the Mitis group of the

genus Streptococcus based on whole genome phylogenetic analyses, and proposed reclassification

of Streptococcus dentisani as Streptococcus oralis subsp. dentisani comb. nov., Streptococcus

tigurinus as Streptococcus oralis subsp. tigurinus comb. nov., and Streptococcus oligofermentans

as a later synonym of Streptococcus cristatus. International journal of systematic and evolutionary

microbiology 66, 4803-4820.

Jin, P., Xiao, M., Kong, F., Oftadeh, S., Zhou, F., Liu, C., Gilbert, G.L., 2009. Simple, accurate,

serotype-specific PCR assay to differentiate Streptococcus pneumoniae serotypes 6A, 6B, and 6C.

Journal of clinical microbiology 47, 2470-2474. bioRxiv preprint doi: https://doi.org/10.1101/415422; this version posted September 12, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Johnson, H.L., Deloria-Knoll, M., Levine, O.S., Stoszek, S.K., Freimanis Hance, L., Reithinger,

R., Muenz, L.R., O'Brien, K.L., 2010. Systematic evaluation of serotypes causing invasive

pneumococcal disease among children under five: the pneumococcal global serotype project. PLoS

Med 7.

Jourdain, S., Dreze, P.A., Vandeven, J., Verhaegen, J., Van Melderen, L., Smeesters, P.R., 2011.

Sequential multiplex PCR assay for determining capsular serotypes of colonizing S. pneumoniae.

BMC Infect Dis 11, 100.

Kawamura, Y., Hou, X.G., Sultana, F., Miura, H., Ezaki, T., 1995. Determination of 16S rRNA

sequences of Streptococcus mitis and Streptococcus gordonii and phylogenetic relationships

among members of the genus Streptococcus. International journal of systematic bacteriology 45,

406-408.

Kawamura, Y., Whiley, R.A., Shu, S.E., Ezaki, T., Hardie, J.M., 1999. Genetic approaches to the

identification of the mitis group within the genus Streptococcus. Microbiology 145 ( Pt 9), 2605-

2613.

Keith, E.R., Podmore, R.G., Anderson, T.P., Murdoch, D.R., 2006. Characteristics of

Streptococcus pseudopneumoniae isolated from purulent sputum samples. Journal of clinical

microbiology 44, 923-927.

bioRxiv preprint doi: https://doi.org/10.1101/415422; this version posted September 12, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Kilian, M., Poulsen, K., Blomqvist, T., Havarstein, L.S., Bek-Thomsen, M., Tettelin, H., Sorensen,

U.B., 2008. Evolution of Streptococcus pneumoniae and its close commensal relatives. PloS one

3, e2683.

Kilian, M., Riley, D.R., Jensen, A., Bruggemann, H., Tettelin, H., 2014. Parallel evolution of

Streptococcus pneumoniae and Streptococcus mitis to pathogenic and mutualistic lifestyles. MBio

5, e01490-01414.

Kim, M., Oh, H.S., Park, S.C., Chun, J., 2014. Towards a taxonomic coherence between average

nucleotide identity and 16S rRNA gene sequence similarity for species demarcation of prokaryotes.

International journal of systematic and evolutionary microbiology 64, 346-351.

Konstantinidis, K.T., Tiedje, J.M., 2005. Genomic insights that advance the species definition for

prokaryotes. Proceedings of the National Academy of Sciences of the United States of America

102, 2567-2572.

LeBlanc, J.J., ElSherif, M., Ye, L., MacKinnon-Cameron, D., Li, L., Ambrose, A., Hatchette, T.F.,

Lang, A.L., Gillis, H., Martin, I., Andrew, M.K., Boivin, G., Bowie, W., Green, K., Johnstone, J.,

Loeb, M., McCarthy, A., McGeer, A., Moraca, S., Semret, M., Stiver, G., Trottier, S., Valiquette,

L., Webster, D., McNeil, S.A., Serious Outcomes Surveillance Network of the Canadian

Immunization Research, N., 2017. Burden of vaccine-preventable pneumococcal disease in

hospitalized adults: A Canadian Immunization Research Network (CIRN) Serious Outcomes

Surveillance (SOS) network study. Vaccine 35, 3647-3654.

bioRxiv preprint doi: https://doi.org/10.1101/415422; this version posted September 12, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Leung, M.H., Bryson, K., Freystatter, K., Pichon, B., Edwards, G., Charalambous, B.M., Gillespie,

S.H., 2012. Sequetyping: serotyping Streptococcus pneumoniae by a single PCR sequencing

strategy. Journal of clinical microbiology 50, 2419-2427.

Loman, N.J., Gladstone, R.A., Constantinidou, C., Tocheva, A.S., Jefferies, J.M., Faust, S.N.,

O'Connor, L., Chan, J., Pallen, M.J., Clarke, S.C., 2013. Clonal expansion within pneumococcal

serotype 6C after use of seven-valent vaccine. PloS one 8, e64731.

Mavroidi, A., Godoy, D., Aanensen, D.M., Robinson, D.A., Hollingshead, S.K., Spratt, B.G., 2004.

Evolutionary genetics of the capsular locus of serogroup 6 pneumococci. Journal of bacteriology

186, 8181-8192.

McAvin, J.C., Reilly, P.A., Roudabush, R.M., Barnes, W.J., Salmen, A., Jackson, G.W., Beninga,

K.K., Astorga, A., McCleskey, F.K., Huff, W.B., Niemeyer, D., Lohman, K.L., 2001. Sensitive

and specific method for rapid identification of Streptococcus pneumoniae using real-time

fluorescence PCR. Journal of clinical microbiology 39, 3446-3451.

Meier-Kolthoff, J.P., Auch, A.F., Klenk, H.P., Goker, M., 2013. Genome sequence-based species

delimitation with confidence intervals and improved distance functions. BMC bioinformatics 14,

60.

Morrison, K.E., Lake, D., Crook, J., Carlone, G.M., Ades, E., Facklam, R., Sampson, J.S., 2000.

Confirmation of psaA in all 90 serotypes of Streptococcus pneumoniae by PCR and potential of

this assay for identification and diagnosis. Journal of clinical microbiology 38, 434-437. bioRxiv preprint doi: https://doi.org/10.1101/415422; this version posted September 12, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Nelson, A.L., Roche, A.M., Gould, J.M., Chim, K., Ratner, A.J., Weiser, J.N., 2007. Capsule

enhances pneumococcal colonization by limiting mucus-mediated clearance. Infection and

immunity 75, 83-90.

Neufeld F, H.L., 1910. Weitere untersuchungen uber pneumokokken-heilsera. III. Mitteilung

Arbeiten aus em Kaiserlichen Gesundheitsamte 34, 293–304.

O'Brien, K.L., Wolfson, L.J., Watt, J.P., Henkle, E., Deloria-Knoll, M., McCall, N., Lee, E.,

Mulholland, K., Levine, O.S., Cherian, T., Hib, Pneumococcal Global Burden of Disease Study,

T., 2009. Burden of disease caused by Streptococcus pneumoniae in children younger than 5 years:

global estimates. Lancet 374, 893-902.

O'Neill, A.M., Gillespie, S.H., Whiting, G.C., 1999. Detection of penicillin susceptibility in

Streptococcus pneumoniae by pbp2b PCR-restriction fragment length polymorphism analysis.

Journal of clinical microbiology 37, 157-160.

Pai, R., Gertz, R.E., Beall, B., 2006. Sequential multiplex PCR approach for determining capsular

serotypes of Streptococcus pneumoniae isolates. Journal of clinical microbiology 44, 124-131.

Park, I.H., Moore, M.R., Treanor, J.J., Pelton, S.I., Pilishvili, T., Beall, B., Shelly, M.A., Mahon,

B.E., Nahm, M.H., Active Bacterial Core Surveillance, T., 2008. Differential effects of

pneumococcal vaccines against serotypes 6A and 6C. The Journal of infectious diseases 198, 1818-

1822. bioRxiv preprint doi: https://doi.org/10.1101/415422; this version posted September 12, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Pimenta, F.C., Roundtree, A., Soysal, A., Bakir, M., du Plessis, M., Wolter, N., von Gottberg, A.,

McGee, L., Carvalho Mda, G., Beall, B., 2013. Sequential triplex real-time PCR assay for detecting

21 pneumococcal capsular serotypes that account for a high global disease burden. Journal of

clinical microbiology 51, 647-652.

Richter, M., Rosselló-Móra, R., 2009. Shifting the genomic gold standard for the prokaryotic

species definition. Proceedings of the National Academy of Sciences of the United States of

America 106, 19126-19131.

Richter, M., Rosselló-Móra, R., Oliver Glockner, F., Peplies, J., 2016. JSpeciesWS: a web server

for prokaryotic species circumscription based on pairwise genome comparison. Bioinformatics 32,

929-931.

Richter, S.S., Heilmann, K.P., Dohrn, C.L., Riahi, F., Beekmann, S.E., Doern, G.V., 2008.

Accuracy of phenotypic methods for identification of Streptococcus pneumoniae isolates included

in surveillance programs. Journal of clinical microbiology 46, 2184-2188.

Richter, S.S., Heilmann, K.P., Dohrn, C.L., Riahi, F., Diekema, D.J., Doern, G.V., 2013.

Evaluation of pneumococcal serotyping by multiplex PCR and quellung reactions. Journal of

clinical microbiology 51, 4193-4195.

bioRxiv preprint doi: https://doi.org/10.1101/415422; this version posted September 12, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Rolo, D., A, S.S., Domenech, A., Fenoll, A., Linares, J., de Lencastre, H., Ardanuy, C., Sa-Leao,

R., 2013. Disease isolates of Streptococcus pseudopneumoniae and non-typeable S. pneumoniae

presumptively identified as atypical S. pneumoniae in Spain. PloS one 8, e57047.

Rosselló-Móra, R., Amann, R., 2015. Past and future species definitions for Bacteria and Archaea.

Systematic and applied microbiology 38, 209-216.

Salvà-Serra, F., Connolly, G., Moore, E.R.B., Gonzales-Siles, L., 2017. Detection of "Xisco" gene

for identification of Streptococcus pneumoniae isolates. Diagnostic microbiology and infectious

disease.

Scholz, C.F., Poulsen, K., Kilian, M., 2012. Novel molecular method for identification of

Streptococcus pneumoniae applicable to clinical microbiology and 16S rRNA sequence-based

microbiome studies. Journal of clinical microbiology 50, 1968-1973.

Slotved, H.C., Dalby, T., Hoffmann, S., 2016. The effect of pneumococcal conjugate vaccines on

the incidence of invasive pneumococcal disease caused by ten non-vaccine serotypes in Denmark.

Vaccine 34, 769-774.

Tanmoy, A.M., Saha, S., Darmstadt, G.L., Whitney, C.G., Saha, S.K., 2016. PCR-Based

Serotyping of Streptococcus pneumoniae from Culture-Negative Specimens: Novel Primers for

Detection of Serotypes within Serogroup 18. Journal of clinical microbiology 54, 2178-2181.

bioRxiv preprint doi: https://doi.org/10.1101/415422; this version posted September 12, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Teng, L.J., Hsueh, P.R., Tsai, J.C., Chen, P.W., Hsu, J.C., Lai, H.C., Lee, C.N., Ho, S.W., 2002.

groESL sequence determination, phylogenetic analysis, and species differentiation for viridans

group streptococci. Journal of clinical microbiology 40, 3172-3178.

Varghese, R., Jayaraman, R., Veeraraghavan, B., 2017. Current challenges in the accurate

identification of Streptococcus pneumoniae and its serogroups/serotypes in the vaccine era. J

Microbiol Methods 141, 48-54.

Weinberger, D.M., Malley, R., Lipsitch, M., 2011. Serotype replacement in disease after

pneumococcal vaccination. Lancet 378, 1962-1973.

Welinder-Olsson, C., Kjellin, E., Badenfors, M., Kaijser, B., 2000. Improved microbiological

techniques using the polymerase chain reaction and pulsed-field gel electrophoresis for diagnosis

and follow-up of enterohaemorrhagic infection. European journal of clinical

microbiology & infectious diseases : official publication of the European Society of Clinical

Microbiology 19, 843-851.

Whatmore, A.M., Efstratiou, A., Pickerill, A.P., Broughton, K., Woodard, G., Sturgeon, D.,

George, R., Dowson, C.G., 2000. Genetic relationships between clinical isolates of Streptococcus

pneumoniae, Streptococcus oralis, and Streptococcus mitis: characterization of "Atypical"

pneumococci and organisms allied to S. mitis harboring S. pneumoniae virulence factor-encoding

genes. Infection and immunity 68, 1374-1382. bioRxiv preprint doi: https://doi.org/10.1101/415422; this version posted September 12, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Table 1. Taxonomic identification of the correctly-classified Streptococcus pseudopneumoniae strains from GenBank.

GenBank groEL similarity ANIb similarity GGDC ANIb all Vs all strain 1st match % 2nd match % 3th match % S. pseudopneumoniae S. pneumoniae S. mitis (%)

1321 S. pseudopneumoniae S. pseudopneumoniae 100 S. mitis 95.8 S. pneumoniae 94.1 98.4 94.2 91.8 86.6 22725 S. pseudopneumoniae S. pneumoniae 99.6 S. pseudopneumoniae 93.9 S. mitis 93.4 96.9 94.1 91.9 74.9 276-03 S. pseudopneumoniae S. pneumoniae 99.5 S. pseudopneumoniae 93.8 S. mitis 93.3 97 94.1 91.9 75.6 338-14 S. pseudopneumoniae S. pneumoniae 99.6 S. pseudopneumoniae 93.9 S. mitis 93.4 97 94.1 91.8 75.5 5247 S. pseudopneumoniae S. pseudopneumoniae 96.4 S. mitis 95.7 S. pneumoniae 95 96.5 94.1 91.9 72.3 61-14 S. pseudopneumoniae S. pneumoniae 99.6 S. pseudopneumoniae 93.9 S. mitis 93.4 96.8 94 91.6 75.5 G42 S. pseudopneumoniae S. pneumoniae 99.1 S. pseudopneumoniae 94.5 S. mitis 93.9 96.9 94 91.8 75.3 IS7493 S. pseudopneumoniae S. pneumoniae 99.6 S. pseudopneumoniae 93.9 S. mitis 93.4 96.8 94 91.9 73.8 SK674 S. pseudopneumoniae S. pseudopneumoniae 100 S. mitis 95.8 S. pneumoniae 94.1 98.8 94.1 91.9 89.9 The results show the 9 of the 39 genome sequences classified as S. pseudopneumoniae at the GenBank database that were confirmed to their taxonomical identity as S. pseudopneumoniae by ANIb and GGDC analyses.

bioRxiv preprint doi: https://doi.org/10.1101/415422; this version posted September 12, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Table 2. Taxonomic identifications of the correctly-classified Streptococcus mitis strains from GenBank.

NCBI ANIb groEL Similarity ANIb similarity GGDC (%) strain all Vs all 1st match % 2nd match % 3th match % S. mitis S. pseudopneumoniae S. pneumoniae

10712 S. mitis S. mitis 100 S. pseudopneumoniae 95.8 S. oralis 94.2 95.0 92.1 91.4 62.3 11/5 S. mitis S. mitis 98.6 S. pseudopneumoniae 95.9 S. pneumoniae 93.5 93.2 91.6 90.5 62.7 21/39 S. mitis S. mitis 97.4 S. pseudopneumoniae 95.8 S. pneumoniae 94.1 93.1 92.3 91.6 53.4 27/7 S. mitis S. mitis 96.7 S. pseudopneumoniae 95.1 S. pneumoniae 93.5 95.5 91.9 91.3 64.1 29/42 S. mitis S. mitis 99.1 S. pseudopneumoniae 95.8 S. oralis 94.2 95.2 91.9 91.2 63.0 OT25 S. mitis S. mitis 97 S. pseudopneumoniae 95.9 S. oralis 94.3 93.8 92.1 91.5 56.0 SK1080 S. mitis S. mitis 96.8 S. pseudopneumoniae 95.9 S. pneumoniae 94.5 92.9 93.2 92.7 51.8 SK271 S. mitis S. mitis 99.3 S. pseudopneumoniae 95.8 S. oralis 93.8 95.6 92.3 91.5 64.9 SK321 S. mitis S. mitis 96 S. pseudopneumoniae 94.5 S. pneumoniae 94.2 93.6 92.2 91.4 55.8 SK578 S. mitis S. mitis 96 S. pseudopneumoniae 95.4 S. pneumoniae 94.1 93.1 91.5 90.8 54.1 SK642 S. mitis S. mitis 96.6 S. pseudopneumoniae 95.7 S. pneumoniae 94.1 92.5 91.3 90.9 50.0 SK579 S. mitis S. pneumoniae 96 S. mitis 95 S. pseudopneumoniae 94.9 92.6 91.5 90.9 50.9 1111_SMIT S. mitis S. mitis 94.2 S. pseudopneumoniae 93.8 S. oralis 93.5 95.1 92.1 91.2 54.1 13/39 S. mitis S. mitis 95.8 S. pseudopneumoniae 95.4 S. pneumoniae 93.8 93.4 91.6 90.8 54.4 17/34 S. mitis S. mitis 94.6 S. oralis 94 S. pseudopneumoniae 93.7 92.2 91.3 90.8 48.7 18/56 S. mitis S. mitis 94.7 S. pseudopneumoniae 94.7 S. pneumoniae 93.7 93.1 91.5 91.1 53.0 850_SMIT S. mitis S. pneumoniae 95.8 S. mitis 95 S. pseudopneumoniae 94.7 93.1 92.4 91.8 53.9 B6 S. mitis S. mitis 94.6 S. pseudopneumoniae 93.5 S. pneumoniae 93.5 93.3 91.9 91.5 53.5 CMW7705B S. mitis S. mitis 94.5 S. pseudopneumoniae 94.1 S. oralis 93.7 93.2 92.3 91.7 53.6 DD22 S. mitis S. mitis 92.9 S. pseudopneumoniae 92.6 S. pneumoniae 91.7 93.2 91.4 90.8 53.7 DD26 S. mitis S. pseudopneumoniae 95.1 S. pneumoniae 94.7 S. mitis 94.5 92.4 93.1 92.8 49.7 DD28 S. mitis S. pneumoniae 95.3 S. mitis 93.7 S. pseudopneumoniae 93.7 92.6 92.7 92.1 50.6 KCOM 1350 S. mitis S. mitis 95 S. pneumoniae 94.7 S. pseudopneumoniae 94.6 94.0 91.8 91.3 56.9 SK1073 S. mitis S. mitis 95.7 S. pseudopneumoniae 94.1 S. pneumoniae 93.4 94.0 91.8 91.3 56.9 SK1126 S. mitis S. mitis 94.5 S. pseudopneumoniae 93.4 S. pneumoniae 93.1 93.0 91.7 91.1 52.2 SK145 S. mitis S. mitis 95.4 S. pseudopneumoniae 95.1 S. pneumoniae 93.8 94.6 92.1 91.3 60.0 SK564 S. mitis S. mitis 95.9 S. pseudopneumoniae 95.1 S. pneumoniae 94.2 92.7 92.3 91.6 52.1 bioRxiv preprint doi: https://doi.org/10.1101/415422; this version posted September 12, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

SK569 S. mitis S. pneumoniae 94.7 S. mitis 94.2 S. pseudopneumoniae 93.5 92.6 91.7 91.1 50.9 SK575 S. mitis S. mitis 95 S. pseudopneumoniae 95 S. pneumoniae 94.2 93.2 92.4 91.6 53.3 SK597 S. mitis S. pseudopneumoniae 95.9 S. mitis 94.9 S. pneumoniae 93.5 93.0 92.6 91.9 52.5 SK608 S. mitis S. mitis 94.9 S. pneumoniae 94.3 S. oralis 93.7 93.1 92.1 91.5 53.7 SK616 S. mitis S. pneumoniae 94.7 S. mitis 94.2 S. pseudopneumoniae 93.5 92.4 91.6 91.1 50.4 SK629 S. mitis S. mitis 94.6 S. pseudopneumoniae 94.5 S. pneumoniae 93.3 92.3 91.5 91.2 50.2 SK637 S. mitis S. mitis 94.5 S. oralis 94.2 S. pseudopneumoniae 93.7 93.7 92.0 91.2 55.0 SK667 S. mitis S. mitis 95.4 S. pseudopneumoniae 94.5 S. pneumoniae 93.8 92.3 92.0 91.2 50.5 SVGS_061 S. mitis S. pneumoniae 93.5 S. mitis 93 S. oralis 93 92.0 91.5 90.7 49.4 M3-1 S. mitis a 93.2 90.7 91.3 54.3 M3-4 S. mitis a 93.2 90.7 91.4 54.3 SK137 S. mitis a 94.8 91.2 92.0 60.5 SK137 S. mitis a 94.8 91.2 92.1 60.4

The results show the 36 of the 52 genome sequences classified as S. mitis at the GenBank database that were confirmed to their taxonomical identity as S. mitis by ANIb and GGDC analyses. a groEL gene was not present in the genome sequence bioRxiv preprint doi: https://doi.org/10.1101/415422; this version posted September 12, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Table 3. Serotypes identified by sequetyping, using in silico analysis of the 329 genome sequences of S.

pneumoniae from GenBank included in our local database.

Serotype #strains 1 33 2/41A 1 3 14 4 11 5 1 6 32 7A/7F 7 8 2 9A/9V 11 9N/9L 2 10A 3 10C 1 10C/10Fa 8 11a/11D/18F 6 12a/46 3 13/20 2 14 17 21 1 15A/15F 4 16F 1 17F/33C 4 18B/18C 3 19A 48 19B 1 19C 2 19F 28 22A/22F 5 23A 2 23F 16 24Fa 7 33A/35A/33F 2 35F/47F 1 NT 36 NI 14 a match with low similarity value NT, non-typable NI, non-identified

36 bioRxiv preprint doi: https://doi.org/10.1101/415422; this version posted September 12, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Table 4. Serotype identification of strains isolated from clinical samples.

Antiserum panela Real-time PCR Sequetyping # positives 3 3 3 5 4 4 4 1 6A 6A/6B/6C/6D 6A/6B/6C/6D 1 6B 6A/6B/6C/6D 6A/6B/6C/6D 1 7F 7F/7A 7F/7A 5 9N 9N/9L 9N/9L 2 9V 9A/9V 9A/9V 1 8 8 8 2 31 NA 31 1 11A 11A/11D 11A/11D/18F 2 11D 11A/11D 11A/11D/18F 1 12F 12F/12A/44/46 12F 1 15A NA 15A 2 15B 15B/15C 15B/15C 1 15C 15B/15C 15B/15C 1 18A 18 18A 1 18C 18 18B/18C 3 19A 19A 19A 1 19F 19F 19F/19A 1 20 20 20/13 2 22F 22F/22A 22F/22A 4 23B NA 23B 1 23F 23F 23F 1 33F 33F/33A/37 35A/33F/33A 3 35A NA 35C/35B/35A 2 35B NA 35C/35B/35A 2 35F NA 47F/35F 2 NA, non-amplified (not targeted by the assay) a Performed at the Public Health Agency of Sweden

37 bioRxiv preprint doi: https://doi.org/10.1101/415422; this version posted September 12, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Table 5. In silico identification of serogroup 6.

a b PCR PCR S. pneumoniae Accession PCR 1 PCR 2 PCR d e 4 5 Serotype 3c Strain number A/G Serogroup PCR Deletion S1-A2 S2-A1 07AR0125 AFBY01000000.1 G AC + + - + + 6C 801 AQTO01000000.1 G AC + - + - - 6A 845 AQTP01000000.1 G AC + - + - - 6A 1488 AQTQ01000000.1 A BD + - - - - 6B 670-6B CP002176.1 A BD + - + - - 6B BHN191 ASHN01000000.1 A BD + - - - - 6B BHN237 ASHO01000000.1 A BD + - + - - 6B BHN418 ASHP01000000.1 A BD + - + - - 6B BHN427 ASHQ01000000.1 A BD + - + - - 6B BR1064 AFBZ01000000.1 G AC + + + + + 6C CCUG1350 LQQG01000000.1 A BD + - + - - 6B CDC1873-00 ABFS01000000.1 G AC + - + - - 6A EU-NP04 AIKH01000000.1 G AC + + + + + 6C GA02270 AIKJ01000000.1 G AC + - - - - 6A GA02506 AILJ01000000.1 A BD + - - - - 6B GA02714 AIKK01000000.1 G AC + - - - - 6A GA14373 AILN01000000.1 G AC + - - - - 6A GA17328 AGPH01000000.1 G AC + - - - - 6A GA17971 AGPJ01000000.1 G AC + - + - - 6A GA19077 AGPK01000000.1 G AC + - + - - 6A GA41437 AGPN01000000.1 G AC + - + - - 6A GA47033 AGOA01000000.1 G AC + + + + + 6C GA52306 AGPZ01000000.1 G AC + + + + + 6C GA60080 ALCR01000000.1 G AC + + + + + 6C GA60132 ALCV01000000.1 G AC + + + + + 6C GA60190 ALCL01000000.1 G AC + + + + + 6C NorthCarolina6A-23 AGQL01000000.1 G AC + - + - - 6A NP127 AGQC01000000.1 G AC + - + - - 6A SP6-BS73 ABAA01000000.1 G AC + - + - - 6A SPAR55 ALCF01000000.1 G AC + - + - - 6A WL400 AVFA01000000.1 G AC + - + - - 6A K15-99 HQ662206.1 A BD - --- - + + 6D K15-60 HQ662205.1 A BD - --- - + + 6D K15-17 HQ662214.1 A BD - --- - + + 6D K15-129 HQ662208.1 A BD - --- - + + 6D K15-115 HQ662207.1 A BD - --- - + + 6D K13-22 HQ662215.1 A BD - --- - + + 6D K13-110 HQ662218.1 A BD - --- - + + 6D

38 bioRxiv preprint doi: https://doi.org/10.1101/415422; this version posted September 12, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

K13-109 HQ662217.1 A BD - --- - + + 6D K13-108 HQ662216.1 A BD - --- - + + 6D B0704-047 HQ662213.1 A BD - --- - + + 6D 07-107 HQ662212.1 A BD - --- - + + 6D 07-077 HQ662211.1 A BD - --- - + + 6D 07-056 HQ662210.1 A BD - --- - + + 6D Tw02-238 HQ662209.1 A BD - --- - + + 6D a wciP374F - wciP-r; bDel6Cwzy_Fv2 - Del6Cwzy_R; cwciN_6AB_F - wciN_6AB_R; dwciNbetaS1 – wciNbetaA2; ewciNbetaA1 – wciNbetaS2

39 bioRxiv preprint doi: https://doi.org/10.1101/415422; this version posted September 12, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 1. Schematic representation of the targeted cpsB gene in the conserved region with cpsA,

cpsB, cpsC and cpsD within the CPS loci of Streptococcus pneumoniae for amplification, using

two primer pairs.

40 bioRxiv preprint doi: https://doi.org/10.1101/415422; this version posted September 12, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 2. Schematic representation of serogroup 6 differentiation.

41