Estimation of Pathogenic Potential of an Environmental Pseudomonas
Total Page:16
File Type:pdf, Size:1020Kb
www.nature.com/scientificreports OPEN Estimation of pathogenic potential of an environmental Pseudomonas aeruginosa isolate using comparative genomics Carola Berger1, Christian Rückert2, Jochen Blom3, Korneel Rabaey4, Jörn Kalinowski2 & Miriam A. Rosenbaum1,5* The isolation and sequencing of new strains of Pseudomonas aeruginosa created an extensive dataset of closed genomes. Many of the publicly available genomes are only used in their original publication while additional in silico information, based on comparison to previously published genomes, is not being explored. In this study, we defned and investigated the genome of the environmental isolate P. aeruginosa KRP1 and compared it to more than 100 publicly available closed P. aeruginosa genomes. By using diferent genomic island prediction programs, we could identify a total of 17 genomic islands and 8 genomic islets, marking the majority of the accessory genome that covers ~ 12% of the total genome. Based on intra-strain comparisons, we are able to predict the pathogenic potential of this environmental isolate. It shares a substantial amount of genomic information with the highly virulent PSE9 and LESB58 strains. For both of these, the increased virulence has been directly linked to their accessory genome before. Hence, the integrated use of previously published data can help to minimize expensive and time consuming wetlab work to determine the pathogenetic potential. Pseudomonas aeruginosa has been isolated from terrestrial and marine soil, fresh and salt water, sewage, plants, animals, and humans 1. For the latter habitats, it is known as an opportunistic pathogen, which usually spreads to already vulnerable patients, causing ~ 10% of all nosocomial infections in most European Union hospitals2. Its combinatory virulence is transmitted through the action of a myriad of virulence factors. Not every P. aeruginosa isolate conveys an equal level of virulence to a given infection model and a strain that is efective in infecting a plant does not necessarily show an equal amount of virulence towards an animal3,4. For the frequently researched P. aeruginosa PA14 strain this increased virulence, as compared to the type strain PAO1, is mainly due to the presence of additional virulence factors. Teir genes are predominantly clustered on two genomic islands (GIs) termed P. aeruginosa pathogenicity islands (PAPI)5. Due to short generation times, mutations are frequently observed in bacterial genomes, which makes them a dynamic rather than a static gene collection6. For P. aeruginosa, numerous studies have proven that the pan genome can be viewed as a mosaic of a conserved core (~ 90% of a specifc genome) and variable accessory Sets. 7–9. Core genes are defned as genes with orthologues in nearly all strains, which show a conserved synteny and a low average nucleotide substitution rate 7. One study suggests the core genome of the P. aeruginosa species, which makes up the smallest fraction of the pan genome, to consist of 4000–5000 open reading frames (ORFs)4. Te second fraction is the accessory genome with about 10,000 genes. It can be grouped according to general features like the means of inter- and intrachromosomal relocation. By assigning diferent functional modules, it can be sorted into (i) integrative and conjugative elements (ICEs), (ii) replacement islands, (iii) prophages and phage-like elements, and (iv) transposons, insertion sequences (ISs) and integrons 7. Tese genes are only shared by certain, but not all strains of the species and are mainly located in GIs and genomic islets (GIts). By defnition, GIs have a size of at least 10 kb, while GIts are smaller than 10 kb. Both types of elements have been acquired via horizontal gene transfer 7. Tey are the cause for alterations in the genome size of P. aeruginosa, which has been reported to range from 5.2 to 7.4 Mb 4,8. By prokaryotic standards, this is considered rather big, encoding 1Bio Pilot Plant, Leibniz Institute for Natural Product Research and Infection Biology, Hans-Knöll-Institute (HKI), Beutenbergstr. 11a, 07745 Jena, Germany. 2Center for Biotechnology - CeBiTec, University of Bielefeld, Bielefeld, Germany. 3Bioinformatics and Systems Biology, Justus-Liebig University Gießen, Giessen, Germany. 4Laboratory of Microbial Ecology and Technology (LabMET), Ghent University, Ghent, Belgium. 5Faculty of Biological Sciences, Friedrich Schiller University, Jena, Germany. *email: [email protected] Scientifc Reports | (2021) 11:1370 | https://doi.org/10.1038/s41598-020-80592-8 1 Vol.:(0123456789) www.nature.com/scientificreports/ P. aeruginosa Number of ANI with KRP1 strain Total length (bps) G + C-content (%) predicted genes (%) Comment References KRP1 6,737,396 66.3 6301 Tis study PAO1 6,264,404 66.6 5700 99.24 Type strain 23 Common research PA14 6,537,648 66.3 6177 98.36 3 strain Hyper virulent LESB58 6,601,757 66.3 6135 98.81 15 strain Closest sequenced FA-HZ1 6,866,790 66.2 6389 99.98 27 relative to KRP1 2nd closest W45909 6,777,566 66.2 6475 99.96 sequenced relative 28 to KRP1 Table 1. Genomic overview of diferent P. aeruginosa strains used in this study. ANI analysis was performed with the EDGAR platform24,25 genes from numerous and distinct gene families. Tis highlights the great genetic and functional diversity of this species7. Depending on the encoded genes, GIs can be classifed into four functional categories: (i) patho- genicity islands (PIs; predominantly encoding pathogenicity factors), (ii) resistance islands (RIs; predominantly encoding resistance functions), (iii) metabolic islands (MIs; predominantly encoding biosynthesis of (secondary) metabolites), and (iv) symbiotic islands (SIs; predominantly encoding genes related to a host-bacterium symbiotic relationship)19. Te by far largest fractions of the pan genome are singletons and rare genes that are only shared by very few strains. Teir estimated number is at least 30,000 for the P. aeruginosa species4. Over the years, a diferent nomenclature was established naming the islands PAPI-X (P. aeruginosa patho- genicity island), PAGI-X (P. aeruginosa genomic island) and LESGI-X (Liverpool Epidemic Strain genomic island). It is important to note that no direct correlation between PAGI and LESGI exists and that the respective islands are not exclusive to the PA or LES strains of P. aeruginosa. Besides PAPI-I and PAPI-II of P. aeruginosa PA14, 42 other GI have been previously described in the P. aeruginosa species9–12, of which multiple have been directly linked to an increased pathogenicity of the harboring strains12–15. Diferent detection sofware packages are available to help identifying regions of foreign DNA within a given genome. As the algorithms use diferent characteristics, have a diferent degree of sensitivity, and diferent shortcomings, usually not one program is able to identify all GIs and GIts. Hence, a combination of multiple complementary tools should be applied to get a thorough detection. Here, we used the established SIGI-HMM16, IslandPath-DIMOB17, PHASTER18 and GIPSy19 bioinformatic tools. In this study, we describe how the abundantly available sequencing information of a species like P. aeruginosa can be used to characterize a newly sequenced strain. To this end, we sequenced the KRP1 environmental isolate and characterized its phylogenetic relationship by using more than 100 previously published closed P. aeruginosa genomes. We further employed diferent GI detection sofware programs and manual mining, to investigate the genome composition of this exemplary strain. Te strain KRP1 was frst isolated from a microbial fuel cell as one of the dominating bacterial species responsible for the high electron transfer efciency of the mixed community20. Our previous study has shown that this strain shows a remarkably diferent behavior in lab operated bioelectro- chemical systems, as compared to other P. aeruginosa variants21, including an increased production of the redox- active pathogenicity factors phenazines. For deeper investigations into the reasons behind this phenomenon in the future, knowledge of the genomic make-up of this strain is needed. By comparing the genomic content with other highly virulent P. aeruginosa variants, we are able to make educated predictions of the strains pathogenetic potential, without having to perform time- as well as money consuming animal experiments. While these fnd- ings are only predictions and may not be considered proven until actual wet lab testing was performed, they can still be of substantial aid for the Pseudomonas community and the labs working with the strain in question. Results and discussion Pseudomonas sp. KRP1 belongs to the P. aeruginosa species. Te in silico hybrid approach assem- bly of the de novo sequenced KRP1 strain resulted in two circular contigs of 6,162,740 bps and 575,136 bps. As a recent study points out, the choice of the assembly algorithm can have a profound impact on all subsequent analysis22. We therefore employed a combination of a short and long read assembler, followed by a manual cura- tion to ensure fulfllment of the suggested 3 C criteria (contiguity, correctness and completeness)22. Synteny comparisons between this initial in silico assembly and closely related P. aeruginosa strains showed multiple rearrangements of the ORFs encoded on the putative mega plasmid. In P. aeruginosa PA14, the corresponding sequence is located between two large homologous ribosomal RNA clusters. Tese clusters are known to be spots of inner genome rearrangements within the P. aeruginosa species3,23. Terefore, PCR was used to investigate the DNA sequence surrounding the ribosomal RNA clusters on the main chromosome and on the potential mega plasmid. Tis resulted in a redefned genome structure of KRP1, with one circular chromosome, containing 6,301 annotated protein-coding genes (Table 1). In the original study20, isolate KRP1 showed the highest similarity BLAST hit with Pseudomonas aerugi- nosa ATCC 27853 at 95% identity along a 197 bp fragment of the 16S rRNA gene. To re-evaluate its phyloge- netic relationship within the Pseudomonas genus the average nucleotide identity (ANI) percentage of the KRP1 genome was calculated with respect to 105 fully sequenced P.