bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
RNA Atlas of Human Bacterial Pathogens Uncovers Stress Dynamics Linked to Infection
Kemal Avican1*, Jehad Aldahdooh2,3, Matteo Togninalli4,5, Jing Tang2,3, Karsten M. Borgwardt4,5, Mikael Rhen6 and Maria Fällman1,7*
1Department of Molecular Biology, Laboratory for Molecular Infection Medicine Sweden (MIMS), Umeå Centre for Microbial research (UCMR), Umeå University, Umeå, 90187, Sweden. 2Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, 00014, Finland. 3Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki, 00014, Finland 4Department for Biosystems Science and Engineering, ETH Zürich, Basel, 4058, Switzerland. 5Swiss Institute for Bioinformatics, Switzerland. 6Department of Microbiology, Tumor and Cell Biology (MTC), Karolinska Institute, Stockholm, 17165, Sweden 7Lead contact *Correspondence : [email protected] (K.A.), [email protected] (M.F.)
Abstract
Despite being genetically diverse, bacterial pathogens can adapt to similar stressful
environments in human host, but how this diversity allows them to achive this is yet not
fully understood. Knowledge gained through comparative genomics is insufficient as it
lacks the level of gene expression reflecting gene usage. To fill this gap, we
investigated the transcriptome of 32 diverse bacterial pathogens under 11 host related
stress conditions. We revealed that diverse bacterial pathogens have common
responses to similar stresses to a certain extent but mostly employ their unique
repertoire with intersections between different stress responses. We also identified
universal stress responders which shed light on the nature of antimicrobial targets. In
addition, we found that known and unknown putative novel ncRNAs comprised a
significant proportion of the responses. All the data is collected in PATHOgenex atlas,
1 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
providing ample opportunities to discover novel players critical for virulence and
maintenance of infection.
Introduction
Bacterial pathogens with different genetic and physiological features share the
capacity to sense and respond to external changes in the host via regulation of their
global transcriptome. These responses, commonly called stress responses, are critical
for pathogen invasiveness and protection from host immunity, which makes them
topics of interest for future development of new antimicrobials. The responses are often
complex, and synergistic regulation of regulatory networks can be pivotal in sensing
and adapting to environments in different colonization niches during different phases
of infection (Avican et al., 2015; Heine et al., 2018; Malachowa et al., 2015; Mandlik et
al., 2011; Nuss et al., 2017). Infecting bacteria encounter numerous stress conditions
which vary depending on the route and phase of infection. For many pathogens, one
of the first changes upon infection of mammalian hosts is altered temperature.
Consequently, bacterial virulence factors, such as type III secretion systems in Yersinia
spp. and Shigella spp., are positively regulated by temperature (Bolin et al., 1988;
Prosseda et al., 2004). Low pH in the gastrointestinal tract (GIT), genital tract, dental
plaque, and skin, and acid exposure in phagosomes represents additional stressful
environments that pathogenic bacteria have to adapt to (Lukacs et al., 1991; Lund et
al., 2014). Another agent commonly affecting enteric bacteria are bile salts produced
in the liver and secreted into the GIT, as well as secondary bile salts produced by the
microbial flora (Hofmann and Hagey, 2008). The hyperosmotic nature of the blood
stream and GIT can also be harsh for certain pathogens, inducing the expression of
virulence genes in some cases, such as in Helicobacter pylori and Vibrio cholerae
(Ishikawa et al., 2012; Loh et al., 2007). Limited nutrient, oxygen, and iron levels in the
2 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
host are other stresses that most pathogens encounter and have to adapt to for survival
(Fang et al., 2016). Furthermore, many invading bacteria encounter neutrophils and
macrophages, which produce toxic reactive oxygen and nitrogen species.
Bacterial pathogens are diverse regarding gene content and regulatory networks, but
yet equally successful to adapt to these host stresses. Comparative genomics studies
have enabled accumulated knowledge of the diversity of bacterial pathogens.
However, the question, how and when different gene products are employed by
diverse pathogens to cope with same stresses encountered in human host remains to
be answered. To complement comparative genomics studies and answer those
questions, global gene expression profiling of diverse bacterial pathogens under host
related conditions is desired. Even though there are large amount of gene expression
data available for many bacterial pathogens, the collection of such dataset from
different sources is not optimal to complement comparative genomics due to
differences in sample handling, experimental methodologies, and incomplete datasets.
In this study, we analyzed the global expression profile of 32 different bacterial
pathogens, representing 28 different species, under 11 infection-relevant stress
conditions with same experimental set-up and methodology. We assessed expression
profile of 105 088 genes and identified common and specific gene groups to certain
bacterial groups with comparative genomics. All datasets were collected in an
interactive RNA atlas, named PATHOgenex (www.pathogenex.org), freely available to
the research community to interrogate expression profiles of human pathogens. The
datasets were used to uncover similarities and discrepancies in different stress
responses across different groups of bacteria. This was made possible by grouping
genes present in different bacteria according to function and homology and computing
a score showing the probability to be regulated in a particular environment. These
3 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
scores also allowed us to identify universal stress responders, which are genes that
are present in many bacterial species showing altered expression in multiple stress
conditions. This group of genes could potentially serve as a source for studies aiming
to identify novel targets of antimicrobials as it contain many already known targets. We
could also show that non-coding regulatory RNAs can be differentially regulated in
response to stressful environments and that novel small and long non-coding RNAs
can be explored from the dataset.
Results and Discussion
Consensus In Vivo Stress Conditions
The majority of bacteria chosen for the PATHOgenex RNA atlas represent pathogens
causing worldwide health problems. Most of the strains are commonly used in the
microbial research community and diverse in terms of Gram staining, phylogeny, and
oxygen requirement, which should facilitate interpretation and use of data (Figure 1A).
We designed 10 experimental setups mimicking different infection-relevant stress
conditions expected to be encountered by bacterial pathogens in the human host:
acidic stress, bile stress, low iron, hypoxia, nutritional downshift, nitrosative stress,
osmotic stress, oxidative stress, high temperature and starvation (stationary phase). A
very comprehensive suite of infection relevant conditions were previously described
for Salmonella enterica serovar Typhimurium (Kroger et al., 2013). We used similar
conditions that are applicable to diverse human bacterial pathogens. We also included
exposure to specific in vitro virulence inducing conditions, such as shifting Yersinia
pseudotuberculosis from 26°C to 37°C and depleting extracellular Ca2+, L-lactate
supplementation for Neisseria spp. (Sigurlasdottir et al., 2017), and presence of
charcoal-like resin XAD-4 in the growth medium for Listeria monocytogenes
4 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
(Ermolaeva et al., 2004) (Figure 1B). For some species, such as Mycobacterium
tuberculosis and Legionella pneumophila, virulence inducing conditions are unknown
and were not performed. As control for differential expression analyses, we utilized
unexposed, exponentially grown bacteria. The growth temperature and culture
medium were adjusted for each specific strain to achieve optimal growth conditions.
Similarly, the strength of the stress with the different agents and associated exposure
time was designed to be as similar and relevant as possible for each bacterial species.
However, minor changes were necessary for certain conditions; for example, the “low
pH level” used for H. pylori strains were lower than for the others, and a significantly
higher H2O2 concentration was required to induce a stress response in Acinetobacter
baumannii (Table S1).
Large Scale RNA-seq generates Accurate Datasets
All stress experiments were performed in laboratories with expertise in
growing/studying the different strains (Figure 1A and Figure S1). Three biological
replicates were used for statistical analyses, in total 1 122 rRNA-depleted RNA
samples were prepared using RNAtag-Seq, an approach based on tagging RNA
samples with barcode oligonucleotides, enabling pooling of multiple samples in one
RNA-seq library (Shishkin et al., 2015) (Figure 1C). This strategy is expected to limit
biases during sample preparation. We used a set of 36 previously described barcode
DNA oligonucleotides (Shishkin et al., 2015) in which the same set of three oligos was
applied for each condition for consistency (Key Resources). The sequencing reads
from each library were demultiplexed according to the 8-nt unique barcode reads to
separate the different samples. The number of reads for the different barcoded
samples did not reveal significant variation across conditions and species (Table S2),
excluding bias regarding over-representation of one or a set of barcodes during library
5 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
preparation. We obtained an average of 12.2 million reads for each rRNA-depleted
barcoded library, which far exceeds the 2-3 million reads considered sufficient to
determine differentially expressed genes from bacteria with high significance (Haas et
al., 2012). The average mapping of the reads to the reference genomes was >90%
(Table S2, Key Resources). The lowest coverage (67.3%) was seen for the clinical
isolate of Achromobacter xylosoxidans, likely due to the lack of an appropriate
reference genome. In this case, we used the sequenced A. xylosoxidans genome that
gave the highest number of mapped reads. In general, the read mapping to reference
genomes showed efficient rRNA depletion, with only 0.1-5% reads mapping to rRNA
and tRNAs, except for Campylobacter jejuni, two of the H. pylori strains, and Neisseria
gonorrhoeae, which had higher proportions mapping to rRNA (Table S2).
Nevertheless, the library sequencing for these species was deep enough to cover 94%
of the coding sequences (CDSs), with >10 reads per CDS in the least efficiently
depleted library.
Genome Size and Infection Niche Influence Gene Expression
To determine what proportions of genes are active or silent under tested conditions we
calculated the expression levels of annotated CDSs, the transcript per million (TPM)
values, for all strains. Hierarchical clustering of expression profiles clustered replicates
of the same sample together, indicating a robust measurement of gene expression in
all conditions and strains (Figure S2). For gene expression analysis, genes with TPM
≥ 10 were considered to be active and genes with TPM < 10 were considered to be
silent, which is similar to definitions used by others (Kroger et al., 2013; Wagner et al.,
2013). We found that 32-95% of all genes in each of the 32 strains were constitutively
active across the 12 tested conditions. There were also genes only active in certain
conditions (one to 11 conditions), but a set of genes (0.7-20%) were silent in all of the
6 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
conditions (Figure 2A). The silent genes may require more complex environments with
synergistic effects of many stresses, as in real in vivo environments, to be expressed.
It was obvious that bacterial species with relatively smaller genomes had a greater
proportion of active genes in all conditions (Figure 2B). This may indicate conservation
of indispensable genes during the evolutionary shrinkage of the genomes. Reductive
evolution is a well-known event in endosymbionts and pathogenic bacteria with respect
to adaptation in defined niches (Miravet-Verde et al., 2017). This is illustrated by the
larger genomes of ubiquitous environmental species, such as E. coli, and smaller
genomes of niched pathogens, such as Borrelia burgdorferi. A more flexible response
was seen in turning transcription on and off in bacteria with larger genomes, which may
be related to the function and number of regulatory proteins within the species.
Comparative genomics studies of gene content and transcription machinery have
previously demonstrated that the number of transcription factors per open reading
frame increases with genome size in many pathogenic bacteria (Perez-Rueda et al.,
2009). We observed the same pattern within the set of pathogens in this study (Figure
2C). Thus, previous assumptions of less flexibility in turning gene expression on and
off in pathogens with reduced genomes were here verified at the RNA level across
multiple bacterial species.
Most of the bacteria with reduced genomes represent pathogens with restricted niches,
which match the low requirement of dynamic changes in gene expression (Figure 2A).
M. tuberculosis and P. aeruginosa did not follow this trend (Figure 2B), suggesting
that reduced regulation of gene expression is not always due to genome reduction.
Hierarchical clustering based on profiles of constitutively active genes, genes active in
certain conditions, and silent genes generated two bacterial clusters: ubiquitous
environmental species (C1) and niched pathogens (C2) (Figure 2D). Comparing the
7 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
genome sizes of the strains in the two clusters clearly showed that the strains with
restricted infection niches had smaller genomes (Figure 2E). Therefore, our data verify
at the level of RNA expression that bacterial pathogens with restricted niches, mostly
with reduced genomes, have lower plasticity to turn gene expression off.
Bacteria Respond to Stress via Dynamic Changes in Gene Expression
To reveal gene expression patterns associated with responses to different
environmental stimuli we performed differential gene expression analyses for each
strain. Compared to gene expression in unexposed, exponentially grown control
bacteria, most bacteria dynamically responded to the stresses by regulating 62-90%
of their genes in at least one condition (Figure 3A). The highest fraction of regulated
genes was measured under hypoxia, nutritional downshift, and stationary phase
conditions (Figure 3B). The differential expression analysis was verified by checking
for differential expression of known genes that are expected to be regulated under
certain stress conditions. We found that most genes expected to be induced under the
different conditions were upregulated in the species harboring the gene, clearly
showing the accuracy of the data (Figure S3A).
Given that different bacteria have distinct features, we also analyzed the levels of
responses for different groups of bacteria, such as Gram-negative and -positive,
aerobic and microaerophilic, different phylogenic orders (Figure 1A), and the niched
pathogens versus environmental species groups (Figure 2D). The levels of responses
were relatively similar between different groups. For Gram-negative and -positive
strains, however, responses to bile were significantly higher in the latter group, possibly
reflecting a higher magnitude of stress due to having only one plasma membrane
(Figure 3C, Figure S3B, C, and D).
Clustering Genes into Gene Groups for Cross Microbial Comparisons
8 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Comparison of differential expression of genes with similar functions or homologous
genes across different bacterial species is important for a broad and comprehensive
understanding of gene regulation and gene function in terms of adaptation to new
environments. Therefore, we clustered genes from the 32 strains into gene groups
using two different orthology and homology approaches: one based on functional
orthologs using KEGG’s annotation tool GhostKOALA (Kanehisa et al., 2016), and one
based on isofunctional homologs using the PATRIC database with Patric Global Family
(PGFam) (Wattam et al., 2014). KEGG orthology (KO) numbers are assigned to genes
harboring the same function, independent of sequence homology, and can include
genes with different sequences and lengths. For example, KO0123 is assigned to the
genes encoding formate dehydrogenase major subunit, which includes all major
subunits of formate dehydrogenase, including FdhF, FdnG, FdoG, and NapA in our
dataset. On the other hand, PGFam numbers are assigned to genes with the same
function and sequence homology (i.e., isofunctional homologs) as indicated by RAST
and CoreSEED (Davis et al., 2016). Therefore, PGFam groups are more specific in
terms of function and homology, but with fewer genes per group. Consequently, in
contrast to KO groups, formate dehydrogenase subunits were assigned to four
different PGFam groups. For KEGG, the tool assigned 6054 KO numbers to 63 831
genes, representing 60.7% of the total 105 088 genes in the dataset (Figure 4A). For
PGFam, 27 999 PGFam numbers were assigned to 97 565 genes, representing 92.8%
of the total number of genes in the dataset.
Revealing Probability of Gene Groups to be Differentially Expressed
To predict the probability of the genes within the KO or PGFam groups to be regulated
under certain stress conditions, we formulated an equation that computes a stress
condition-specific score indicating differential regulation of genes in the group, defined
9 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
as ‘probability to be differentially expressed (PTDEX) score’. The equation was
employed to gene groups that contain at least 2 genes (5 340 KO groups and 11 353
PGFam groups) (Table S3 and S4). The resulting PTDEX score takes into account
how well conserved the genes in the group are among the 32 pathogens, what
proportion of the genes in the group are differentially regulated under a particular stress
condition, and how well this regulation is preserved among the 32 strains (Figure 4B).
For example, a gene group that is present in all 32 species and differentially expressed
under acidic stress in many of the species will have a high PTDEX score for acidic
conditions. This not only indicates a high probability of the genes in this group being
differentially expressed under acidic stress, but can also hint regulation of homologous
genes from bacterial pathogens not included in this data set. Hence, using PTDEX
scores for comprehensive understanding of gene regulation in bacteria is a novel
approach providing in depth overview of common and specific regulation of same
genes in different species and simplifies cross microbial comparisons (Figure 4C).
To test the power of the resulting PTDEX scores, we ranked the KO and PGFam
groups based on PTDEX scores in low iron and oxidative stress conditions and plotted
the groups representing the highest 20 (Figure 4D and 4E). Both KO and PGFam
PTDEX scores were found to be indicative for regulation under these conditions. Many
of the genes with high PTDEX scores for low iron represented genes encoding proteins
known to be involved in iron uptake and iron homeostasis, such as tonB, exbB, exbD,
mntABC, bfr, bfd, ibpA, ibpB, sufA, feoB, and nuoA (Figure 4D). Similarly, the genes
with highest KO and PGFam PTDEX scores in oxidative stress were genes known to
respond to DNA damage and oxidative stress, such as lexA, soda, sodB, aphC, dinI,
umuC, rtpR, and iscA (Figure 4E). As expected the highest 20 were not exactly the
same for the KO and PGFam groupings, but most of the expected gene groups were
10 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
found in both. The higher PTDEX score observed for PGFam groups is likely due to
more similar regulation of isofunctional homologues than of functional orthologs.
Therefore, to derive more accurate conclusions, we used PGFam PTDEX scores for
the remainder of our analysis.
PTDEX Scores Enable Comparisons Between Stress Responses of Evolutionary
Distinct Bacterial Groups
The gene expression data in PATHOgenex were collected from diverse bacterial
species (Figure 1A) with distinctive gene content and transcriptional regulatory
networks. Common transcriptional regulatory patterns, between for example Gram-
negative and -positive bacteria, are anticipated to be rare. Therefore, we also re-
clustered genes of Gram-negative (21 strains from Gammaproteobacteria,
Betaproteobacteria, and Epsilonproteobacteria) and Gram-positive strains (9 strains
from Bacilli) into G- and G+ PGFam groups and then re-computed the PTDEX scores
separately (Table S5 and S6). We then generated a similarity matrix with the Pearson
correlation coefficient for PTDEX scores from 7105 G- PGFam groups representing 47
468 genes and 2903 G+ PGFam groups representing 14 200 genes for each stress
condition. It was here obvious that despite the substantial difference in PGFam groups
(Figure 5A), parts of the responses to external changes were indeed shared by Gram-
negative and -positive bacteria (Figure 5B). The conditions that showed highest
overlaps were hypoxia, nutritional, and stationary phase.
Despite this similarities there is significant differences in responses by Gram-positive
versus Gram-negative strains, which supports previous studies (Rodionov, 2007).
Transcription factors, especially global regulators, are poorly conserved in general, and
transcriptional regulation has been found to be more flexible than their target genes
(Lozada-Chavez et al., 2006). In support of this, an analysis of differential expression
11 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
of transcription factors in response to the different stresses revealed high diversity
between Gram-negative and Gram-positive strains, as expected (Figure S4). Most of
the transcription factors were unique to either Gram-positive or -negative bacteria, and
the few homologs present were mostly regulated differently. Further, the observed
difference in response to bile, where Gram-positive, but not Gram-negative bacteria
responded with expensive differential gene expression (Figure 3C) was also reflected
here, with very few high PTDEX score transcription factors for these conditions in
Gram negatives (Figure S4).
PTDEX Scores Enable Identification of Intersections Between Stress Responses
Reponses to different stressors are expected to partly overlap due to involvement of
the same molecular pathways, as well as functional diversity of gene products. Many
stress responses are for example expected to involve halted growth, enabling
translational adjustments and redirection of resources. To identify overlaps between
different stress responses, the PTDEX scores of G-/G+ PGFam groups were used to
compare responses among the Gram-negative and -positive bacteria separately.
These comparisons indicated a higher degree of overlap between responses in Gram-
negative bacteria, where especially hypoxia, stationary phase, and nutritional
downshift overlapped to a high extent. Furthermore, also responses to low iron,
oxidative and nitrosative stress, as well as responses to acidic and osmotic stress
overlapped (Figure 5C).
To reveal overlapping pathways and processes, we next implemented a co-expression
module identification algorithm (Russo et al., 2018) on G- and G+ PGFam PTDEX
scores. This algorithm generated six modules of PGFam groups for Gram-negatives
showing common patterns of PTDEX scores in certain combinations of stress
conditions (Figure 5D, Table S5). KEGG pathway mapping of the genes in these
12 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
modules indicated overlapping pathways and processes involved in different stress
responses (Figure 5D, N-module 1-6). It was obvious that the response to hypoxia
partly overlapped with responses from all of the other stress conditions (N-module 1
and 3-5), except the response to osmotic stress that seemed to be unique (N-module
6). The largest overlap was seen between responses to hypoxia, nutritional downshift
and stationary phase, reflecting the complexity and high requirement of multiple
pathways for managing these situations. The overlapping expression patterns included
many genes involved in carbon metabolism, protein synthesis, oxidative
phosphorylation and quorum sensing (N-module 1). Responses to low iron, nitrosative,
and oxidative stress also shared PTDEX score patterns (N-module 2). In Gram-positive
strains, the degree of overlap between stress responses was lower (Figure 5C) and
only three modules were generated (Figure 5D, P-module 1-3). The largest overlap
here was between hypoxia and stationary phase (P-module 1), and to a lesser extent
between nutritional downshift and temperature (P-module 2). In contrast to that seen
for Gram-negative strains, some PGFams groups with high PTDEX scores were
specific to hypoxia only (Figure 5D, P-module 3).
PTDEX Scores Reveal Universal Stress Responders Including Known
Antimicrobial Targets
Although the Gram-negative and -positive bacterial pathogens studied here differ in
gene content, the preserved genes may have fundamental roles for bacterial
physiology and adaptation to stressful environments. Participation in multiple stresses
can be key to their conservation during the evolutionary diversification of species. To
retrieve genes encoding these universal stress responders (USRs), we selected the
PGFam groups that were differentially expressed in at least six conditions and
contained at least 11 genes from Gram-negative strains (21 strains in total) and 4
13 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
genes from Gram-positive strains (9 strains in total) (Table S7). We used the general
PGFam PTDEX scores and the threshold for ‘high PTDEX score’ was determined to
be ≥0.25, which indicated the score value where at least 50% of the genes in the group
are differentially expressed (Figure S5). These criteria allowed identification of 421
USRs. Functional clustering of the USRs with PATRIC subsystems (Wattam et al.,
2017) revealed that USRs are involved basic biological processes (Figure S6).
Interestingly, 24 of the identified USRs were genes representing targets for antibiotics,
where mutations in their sequences been shown to confer antibiotic resistance (Table
1). Hence, it is thus likely that novel putative antibiotic targets are to be found among
remaining 397 USRs.
Exploring Stress Related Expression from Non-coding Regions
In addition to differential expression of annotated genes, this data set also provides
information of the complete transcriptional landscapes in different pathogens.
Analysing overall transcription, we surprisingly observed that a substantial proportion
of RNAs were transcribed from non-coding DNA regions, including sRNAs, regulatory
non-coding RNAs, 5’ untranslated regions (UTRs), 3’-UTRs, and intergenic regions
(Table S2). The proportion is expected to be even higher than calculated here since
the majority of sRNAs were excluded during sample preparation due to the >100 nt
cut-off for library preparation. In some bacteria, expression of non-CDSs was higher
than CDSs in hypoxia, nutritional downshift, and stationary phase conditions. Such a
large number of transcripts from non-CDSs in bacteria has not been reported
previously, likely due to studies focusing on CDSs usually only consider annotated
CDSs for mapping reads, and that studies of non-coding RNAs generally use RNA
purification methods dedicated to sRNAs. Noteworthy, for most species, the percent
of transcripts from non-CDS was increased upon stress exposure in comparison to
14 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
control (Figure 6A). This indicates that expression from non-coding regions also
contribute bacterial response to stress conditions.
Stress Responses Includes tmRNA and Various non-coding RNAs
To reveal the non-CDS transcripts responsible for the observed increases of non-CDS
transcripts we analyzed the read mappings and found that the stress associated
increases in many cases correlated with increased thexpression of tmRNA (Figure
6B). This is interesting as tmRNA, also known as SsrA, is a conserved and abundant
type of RNA in bacteria that is important for trans-translation, a process that restores
translation in detrimental situations (Gueneau de Novoa and Williams, 2004; Keiler,
2007; Moore and Sauer, 2005). For many bacteria, lack of tmRNA results in growth
defects, sensitivity to various stress conditions, and attenuation of virulence (Brito et
al., 2016; Munavar et al., 2005; Okan et al., 2006; Svetlanov et al., 2012). The
increased expression of tmRNA (Figure 6B, Table S8) indicates that trans-translation
is part of a common adaptation to challenging environments. The induction of tmRNA
are particularly high in stationary phase, hypoxia, and upon nutritional downshift, which
indeed are conditions associated with halted growth. In line with this, smpB, encoding
a protein that interacts with tmRNA, was here identified as a USR with high PTDEX
scores in stationary phase, hypoxia, and nutritional downshift (Figure S6).
The level of tmRNA was however not high for all strains and conditions (Figure 6C),
suggesting that also other non-coding RNAs are induced upon external stress. This
was for example the case for Klebsiella pneumoniae (Figure 6B), where an analysis
of the mapped reads across its genome revealed that expression of the carbon storage
regulatory non-coding RNA CsrB correlated with expression ratios for the intergenic
regions in many the stress conditions (Figure 6D). CsrB in Klebsiella spp. has not been
extensively investigated, but has in other species been shown to be involved in
15 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
regulation of many biological processes, such as carbon metabolism, biofilm formation,
quorum sensing, and virulence (Fortune et al., 2006; Lenz et al., 2005; Liu et al., 1997).
However, even though expression of CsrB correlated with increased transcription of
non-CDSs in most of the conditions, this was not the case for stationary phase (Figure
6D). For the stationary phase we found high extent read mapping to 5’-UTR and 3’-
UTR regions of a CDS (KPN_01149) encoding a hypothetical protein (Table S2,
Figure 6D and E). The expressed region started at a position previously reported to
be a transcriptional start site (Kim et al., 2012), supporting the accuracy of this finding
(Figure 6E). The function of KPN_01149 is not known, but the presence of stress-
induced bacterial acidophilic repeat motifs supports a role in stress responses.
Similar to that of K. pneumoniae also S. aureus showed expression from non-coding
regions that not corresponded with expression of tmRNA. Here we found high level of
reads mapping to a 1230-nt intergenic region situated on the negative strand between
two CDSs encoding a hypothetical protein and an AraC family transcriptional regulator
(Figure 6F and G). Interestingly, by analysing data from a previous study (Szafranska
et al., 2014) we found that expression of this intergenic region is increased during S.
aureus infection (Figure S7B). This sequence is only found in S. aureus and two other
Staphylococcus spp., S. argenteus and S. schweitzeri. Though S. argenteus is known
to colonize humans, S. schweitzeri isolates have been recovered from fruit bats and
non-human primates, including gorilla (Becker et al., 2019). Given the broad repertoire
of different environmental conditions and the number of species, detailed analyses of
mapped reads in the different data sets provide opportunities for finding novel
intergenic non-coding RNAs or UTRs. Both K. pneumoniae and S. aureus are
examples of human pathogens responsible for causing troublesome infections, often
16 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
involving biofilm and long-term treatments, thereby contributing to increased antibiotic
resistance.
The PATHOgenex RNA Atlas Provides Opportunities for Exploring Gene
Expression in Human Pathogens
All transcription data have been collected in an interactive database designed to serve
researchers, the PATHOgenex RNA atlas (www.pathogenex.org). The data can be
browsed by selecting the strain of interest and searched with one or multiple locus
tags, protein ID, gene product, and PGFam groups. PATHOgenex provides many
opportunities for exploring regulation and function of pathways and gene products in
human pathogens.
As example, we here show differential expression of type VI secretion systems
(T6SSs) in P. aeruginosa PAO1. The genes encoding effectors and other components
of T6SS are organized in operons, and many bacteria harbor multiple T6SS operons.
Different operons are thought to be regulated differently and have different functions,
but current knowledge about their regulation is limited. Utilizing PATHOgenex, we can
now retrieve information of their regulation, supplying a basis for further studies.
PATHOgenex retrievals show that the three P. aeruginosa PAO1 T6SS operons are
regulated under different stress conditions. T6SS-1 is induced upon exposure to acidic
and oxidative stress and to some extent upon nitrosative and osmotic stress but
repressed in hypoxia and stationary phase conditions. T6SS-2 is not induced by
oxidative stress and repressed both by temperature raise and during stationary phase,
T6SS-3 on the other hand is induced during stationary growth, implying a different
function and regulation for these effectors (Figure 7A).
PATHOgenex also provides opportunities for identification of gene products regulated
in responses to certain conditions by utilizing condition-specific PTDEX scores. As an
17 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
example, when retrieving the 5 PGFam groups with highest PTDEX scores in virulence
inducing condition, the bifunctional acetaldehyde dehydrogenase/alcohol
dehydrogenase adhE appeared in the top (Figure 7B). This gene product is known to
be involved in bacterial metabolism, but has also been attributed other functions such
as stress associated translational control and virulence (Beckham et al., 2014; Kurylo
et al., 2018). PTDEX scores of adhE and others can be analyzed further by retrieving
information on which bacteria have this gene and how this gene is regulated in the
different species (Figure 7C). The degree of fold-change is important to know for data
assessment as small fold changes in highly expressed genes are expected to have a
greater impact than small changes in genes with low expression. Therefore,
PATHOgenex also offers possibilities for retrieving information of expression levels for
individual species (Figure 7D).
Taken together, PATHOgenex provides comprehensive information of gene
expression at the gene level in a particular species, but also to a broader level including
gene groups and expression in multiple species. All of this information provides new
opportunities for researchers to generate novel hypotheses and design experiments
accordingly.
Acknowledgments
We are grateful to all the expert labs for opening their lab to us and help with bacterial
culturing and stress exposure experiments. We thank Drs. Peter Lind, Teresa Frisan,
and Saskia Erttmann for critical reading of the manuscript. The work was supported by
Knut and Alice Wallenberg Foundation (No. 2016.0063), Swedish Research Council
(No. 2018-02855), and Insamlingsstiftelsen, Medical Faculty at Umeå University to M
Fallman; Novo Nordisk Foundation (K. Avican was partly supported by grant, No.
NNF17OC0026486, awarded to Dr. Emmanuelle Charpentier at MIMS, The Laboratory
18 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
for Molecular Infection Medicine Sweden); ERC (Starting grant, No. 716063) to J.
Tang; Academy of Finland Research Fellow Grant (No. 317680) to J. Aldahdooh.
Author Contributions
K.A. and M.F. conceived and supervised the project, and wrote the manuscript with
input from J.A, M.T., J.T., K.B., and M.R.. K.A. performed the experiments and
analysed the data. M.T., K.B., K.A, and M.F. designed and M.T. calculated PTDEX
scores. J.A., J.T., K.A., and M.F. designed and J.A. constructed the PATHOgenex
RNA atlas and webpage.
Declaration of Interests
Authors declare no computing interests.
METHOD DETAILS
Bacterial growth and stress exposures
All bacterial strains shown in Figure 1A were grown in their optimal growth medium
and temperatures as indicated in Table S1 in laboratories specialized in each species.
Three bacterial cultures were grown overnight and then subcultured to a new culture
vial with 1:50-1:100 dilution. The cultures were grown until exponential phase with an
OD600 of 0.1-0.6 and exposed to the 11 stresses separately. The stress conditions,
reagents used to generate them, and the exposure times are given in Table S1. Un-
exposed cultures at exponential growth were used as controls for differential
expression analysis. For nutritional downshift, bacterial cultures were spun down at
9000 rpm for 2 minutes, the supernatant removed, the pellets resuspended in 1X M9
supplemented with 0.1 M MgCl2 and 0.1 M CaCl2, and incubated for 30 minutes at
ambient temperature for each strain. For hypoxia, 2 ml screw cap tubes were filled with
19 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
bacterial culture at exponential growth and the caps were screwed on without leaving
space for air. The stress exposures were stopped by adding 5% (final concentration)
phenol:ethanol solution (Figure S1).
Total RNA isolation
One milliliter of triplicate bacterial cultures exposed to the stresses and un-exposed
control culture were immediately pelleted by centrifugation at 9000 rpm at room
temperature for 2 minutes after adding phenol:ethanol. The supernatants were
removed and pellets resuspended in 0.5 ml Trizol solution. For Gram-negative
bacteria, cells were homogenized in Trizol solution by pipetting up and down 15 times.
For Gram-positive bacteria, culture suspensions in Trizol were transferred to previously
cooled bead beater tubes containing 0.1-mm glass beads and treated with Mini-
Beadbeater (Biospec Products Inc, USA) twice at a fixed speed for 45 seconds, and
then cooled on ice for 1 minute between the treatments. Culture homogenates were
incubated at room temperature for 5 minutes and then supplemented with 0.2 ml
chloroform, thoroughly mixed by shaking 10 times, and incubated for 3 minutes. After
centrifugation at 12 000 g at 4°C for 15 minutes, the aqueous upper phase was
carefully transferred to new RNase free tubes. An equal volume of 99% ethanol was
added to the aqueous phase and isolation continued using the Direct-Zol RNA
Miniprep Plus (Zymo Research, USA) RNA purification kit protocol. Total RNAs were
eluted in RNase free water in RNase free tubes. The total RNA concentrations were
measured using the Qubit BR RNA Assay Kit (ThermoFisher Scientific, USA) and RNA
integrity confirmed on a 0.8% agarose gel in TBE buffer.
RNA-seq library preparation with RNAtag-Seq
20 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
All rRNA-depleted RNA-seq library preparations were performed according to Shishkin
et al. (2015), with minor modifications. RNAtag-Seq allows multiple library preparations
in one tube, with initial tagging of total RNA samples with modified (5’P and 3’ SpcC3)
DNA barcoded adaptors, each harboring a unique 8 nt sequence used to demultiplex
individual libraries after sequencing. We combined the library preparation of three
biological replicates from 11 stress-exposed samples and the un-exposed control
sample, resulting in 36 library preparations in one tube for each bacterial strain. To tag
the 36 replicates, 36 unique barcoded adaptors were used, with every three barcodes
used for samples from the same stress conditions in all bacterial species (Table S2).
A total of 100 ng total RNA was used for each biological replicate. The total RNA was
fragmented in 2X FastAP Thermosensitive Alkaline Phosphatase buffer for 3 minutes
at 94°C, DNase treated, and dephosphorylated with a combination of TURBO DNase
and Thermosensitive Alkaline Phosphatase in 1X FastAP buffer for 30 minutes at
37°C. Fragmented, DNase-treated, and dephosphorylated total RNAs were cleaned
with a 2X reaction volume of Agencourt RNAClean XP beads. Cleaned total RNAs
were incubated with 100 pmol of the unique DNA barcode adaptors at 70°C for 3
minutes, and then ligated with T4 RNA ligase 1 for 90 minutes at 22°C. The ligation
was stopped and the enzyme denatured by the addition of RLT buffer. The 36
denatured ligation mixes were then pooled and cleaned in a Zymo Clean &
ConcentratorTM-5 column according to the manufacturer’s 200 nt cut-off protocol. The
RNAs were eluted in 32 µl of RNase free water. Ribosomal RNA was depleted using
the Ribo-ZeroTM Magnetic Gold Kit (Bacteria) according to the manufacturer’s
instructions. The first strand cDNA of each pool was generated using an AffinityScript
Multiple Temperature cDNA synthesis kit with 50 pmol of AR2 primer at 55°C for 55
minutes. The RNA was degraded by adding a 10% reaction volume of 1 N NaOH at
21 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
70°C for 12 minutes and the reaction neutralized with an 18% reaction volume of 0.5
M acetic acid. After cleaning the reverse transcription primers with a 2X reaction
volume of RNAClean XP beads, the 3Tr3 adapter was ligated with T4 RNA ligase 1 at
22°C with overnight incubation. The second ligation was cleaned first with a 2X, and
secondly 1.5X, reaction volume of RNAClean XP beads. The cDNA was then used as
the template for PCR reaction with FailSafeTM PCR enzyme mix using 12.5 pmol
2P_univP5 as forward primers and 12.5 pmol ScriptSeqTM Index (barcode) PCR
primers as reverse primers. The PCR cycles were as follows: 95°C for 3 minutes,
followed by 12 cycles at 95°C for 30 seconds, 55°C for 30 seconds, and 68°C for 3
minutes, and then finishing at 68°C for 7 minutes. The PCR product was cleaned first
with a 1.5X, and secondly 0.8X, reaction volume of AMPure beads and eluted in
RNAse free water. The library concentrations were measured using the QubitTM dsDNA
HS Assay Kit and the library insert size determined by the Agilent DNA 1000 Kit in a
2100 Electrophoresis Bioanalyser Instrument (Agilent, USA).
QUANTIFICATION AND STATISTICAL ANALYSIS
RNA-seq data analysis
The RNA-seq libraries generated by RNAtag-Seq were sequenced by either single end
or paired end Illumina sequencing at SciLifeLab, Stockholm. Each library harboring a
pool of 36 RNA samples was demultiplexed according to the unique 8 nt on the ligated
barcode adaptors to separate reads from each replicate. The sequencing reads
generated in this study were deposited in GEO with accession number GSEXXXXXX.
Demultiplexed reads were then mapped to the genome of the species to which the
RNAs belong in an annotation-independent manner. The accession numbers of the
RefSeq reference genomes used for each strain can be found in “Microbes and
Viruses” in Key Resouces. The number of reads mapped to CDSs, rRNA, and tRNA
22 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
was calculated according to the annotation files. The number of reads mapped to the
non-coding regions were calculated by subtracting the reads mapped to the annotated
regions from the total number of mapped reads. The sequencing reads from primary
transcripts of K. pneumoniae (Kim et al., 2012) with accession number SRR408498
were downloaded from Sequence Read Archive (SRA) and mapped to the same
reference genome used in this study. In vivo and in vitro RNA-seq reads of S. aureus
strain 6850 (Szafranska et al., 2014) with accession number ERP005459 were
downloaded from SRA and mapped to RefSeq reference genome CP006706. All of
the demultiplexing and read mapping steps were performed in CLC Genomic
Workbench (Qiagen, USA).
For differential gene expression analysis, the trimmed mean of M values (TMM)
(Robinson and Oshlack, 2010) normalization method was used to normalize the
sequencing depth of the individual libraries. For each bacterial species, the
comparisons were performed between multiple groups and the control sample so that
the full dataset could be used for fitting the generalized linear model. Using the
complete datasets with multiple group comparisons allowed us to determine whether
a gene has unstable expression for which the variation is random or is differentially
expressed in response to the stress. Read values for genes with a maximum group
mean expression (the maximum average TPM value in the statistical comparison
group) <20 were removed and a threshold of 1.5-fold change with FDR-adjusted p-
value <0.05 was employed to determine differential expression.
Generation of a phylogenetic tree of 32 bacterial strains
A phylogenetic tree of the 32 strains based on NCBI taxonomy was generated with
PhyloT in Newick format. The visualization of the tree was achieved using iTol.
23 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Clustering 32 bacterial strains based on expression profiles
Hierarchical clustering, with Euclidian distance, of 32 strains based on the percent of
genes expressed (TPM≥10) in all 12 conditions, percent of genes expressed in at least
one condition, and percent of genes not expressed in any of the conditions was
performed with ClustVis (Metsalu and Vilo, 2015).
Clustering 105 088 genes based on functional orthologs and isofunctional
homologs
The KEGG orthology group was assigned for clustering genes based on functional
orthologs. The amino acid sequences of all CDSs for each strain were uploaded to
GhostKOALA (Kanehisa et al., 2016) to assign the best KO group in the
genus_prokaryotes KEGG GENES database. Clustering based on isofunctional
homologs was performed with PATRIC’s Proteome Comparison Service. The amino
acid sequences of all CDSs for each strain were compared using its own annotated
genome, if available, or the closest strain to assign the best PGFam group for each
CDS.
PTDEX score calculation
The PTDEX score was calculated for the orthology/homology groups using Python
scripts and Jupyter notebooks, relying on the open-source libraries NumPy and
Pandas. The PTDEX score formula relies on the definition of differentially expressed
genes given in “RNA-Seq Analysis” above and the equation shown in Figure 2B, where
ngenes(on) is the number of differentially expressed genes in the orthology/homology
gene group, ngenes(total) is the total number of genes in the group, nstrains(on) is the
number of strains for which at least one of the orthology group genes is differentially
regulated, nstrains(total) is the total number of strains for a given stress condition, and
nstrains(database) is the total number of strains in the database (nstrains(database) =32
24 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
for general PTDEX scores, 21 for Gram negative-specific, 9 for Gram-positive-specific
PTDEX scores). The additional logarithmic component is a weight factor to give more
relative importance to orthology/homology groups with more genes.
Generation of co-expression modules on PGFam PTDEX scores and associated
KEGG pathway enrichments
The co-expression Modules Identification tool (CEMiTool) (Russo et al., 2018) was
implemented on PGFam groups of Gram-negative and -positive strains to find PGFam
groups with similar PTDEX score patterns over the 11 stress conditions. PGFam
groups associated with each module can be found in Table S5 and Table S6. Each
PGFam group associated with the particular module, if possible, was converted in KO
groups and KO groups were used in the KEGG orthology database to map pathways.
PATHOgenex website construction
PATHOgenex was built using PHP Laravel Framework 6.18.2 and PHP 7.4.4 for
server-side data processing, Javascript ECMAScript 2015 for the front end, and D3.js
5.16.0 and Plotly 1.40.0 libraries for the generation of the interactive visualizations.
Linux distribution CentOS-8 with the 64-bit kernel 4.18.0 running on four processor
cores and 64 Gb of RAM is used to host the web service on the in-house computational
cluster.
References
Avican, K., Fahlgren, A., Huss, M., Heroven, A.K., Beckstette, M., Dersch, P., and Fallman, M. (2015). Reprogramming of Yersinia from virulent to persistent mode revealed by complex in vivo RNA-seq analysis. PLoS pathogens 11, e1004600. Becker, K., Schaumburg, F., Kearns, A., Larsen, A.R., Lindsay, J.A., Skov, R.L., and Westh, H. (2019). Implications of identifying the recently defined members of the Staphylococcus aureus complex S. argenteus and S. schweitzeri: a position paper of members of the ESCMID Study Group for Staphylococci and Staphylococcal Diseases (ESGS). Clin Microbiol Infect 25, 1064-1070. Beckham, K.S.H., Connolly, J.P.R., Ritchie, J.M., Wang, D., Gawthorne, J.A., Tahoun, A., Gally, D.L., Burgess, K., Burchmore, R.J., Smith, B.O., et al. (2014). The metabolic
25 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
enzyme AdhE controls the virulence of Escherichia coli O157:H7. Molecular microbiology 93, 199-211. Bolin, I., Forsberg, A., Norlander, L., Skurnik, M., and Wolf-Watz, H. (1988). Identification and mapping of the temperature-inducible, plasmid-encoded proteins of Yersinia spp. Infection and immunity 56, 343-348. Brito, L., Wilton, J., Ferrandiz, M.J., Gomez-Sanz, A., de la Campa, A.G., and Amblar, M. (2016). Absence of tmRNA Has a Protective Effect against Fluoroquinolones in Streptococcus pneumoniae. Front Microbiol 7, 2164. Davis, J.J., Gerdes, S., Olsen, G.J., Olson, R., Pusch, G.D., Shukla, M., Vonstein, V., Wattam, A.R., and Yoo, H. (2016). PATtyFams: Protein Families for the Microbial Genomes in the PATRIC Database. Front Microbiol 7, 118. Ermolaeva, S., Novella, S., Vega, Y., Ripio, M.T., Scortti, M., and Vazquez-Boland, J.A. (2004). Negative control of Listeria monocytogenes virulence genes by a diffusible autorepressor. Mol Microbiol 52, 601-611. Fang, F.C., Frawley, E.R., Tapscott, T., and Vazquez-Torres, A. (2016). Bacterial Stress Responses during Host Infection. Cell Host Microbe 20, 133-143. Fortune, D.R., Suyemoto, M., and Altier, C. (2006). Identification of CsrC and characterization of its role in epithelial cell invasion in Salmonella enterica serovar Typhimurium. Infection and immunity 74, 331-339. Gueneau de Novoa, P., and Williams, K.P. (2004). The tmRNA website: reductive evolution of tmRNA in plastids and other endosymbionts. Nucleic Acids Res 32, D104- 108. Haas, B.J., Chin, M., Nusbaum, C., Birren, B.W., and Livny, J. (2012). How deep is deep enough for RNA-Seq profiling of bacterial transcriptomes? Bmc Genomics 13. Heine, W., Beckstette, M., Heroven, A.K., Thiemann, S., Heise, U., Nuss, A.M., Pisano, F., Strowig, T., and Dersch, P. (2018). Loss of CNFY toxin-induced inflammation drives Yersinia pseudotuberculosis into persistency. PLoS Pathog 14, e1006858. Hofmann, A.F., and Hagey, L.R. (2008). Bile acids: chemistry, pathochemistry, biology, pathobiology, and therapeutics. Cell Mol Life Sci 65, 2461-2483. Ishikawa, T., Sabharwal, D., Broms, J., Milton, D.L., Sjostedt, A., Uhlin, B.E., and Wai, S.N. (2012). Pathoadaptive conditional regulation of the type VI secretion system in Vibrio cholerae O1 strains. Infection and immunity 80, 575-584. Kanehisa, M., Sato, Y., and Morishima, K. (2016). BlastKOALA and GhostKOALA: KEGG Tools for Functional Characterization of Genome and Metagenome Sequences. Journal of molecular biology 428, 726-731. Keiler, K.C. (2007). Physiology of tmRNA: what gets tagged and why? Current opinion in microbiology 10, 169-175. Kim, D., Hong, J.S., Qiu, Y., Nagarajan, H., Seo, J.H., Cho, B.K., Tsai, S.F., and Palsson, B.O. (2012). Comparative analysis of regulatory elements between Escherichia coli and Klebsiella pneumoniae by genome-wide transcription start site profiling. PLoS genetics 8, e1002867. Kroger, C., Colgan, A., Srikumar, S., Handler, K., Sivasankaran, S.K., Hammarlof, D.L., Canals, R., Grissom, J.E., Conway, T., Hokamp, K., et al. (2013). An infection- relevant transcriptomic compendium for Salmonella enterica Serovar Typhimurium. Cell host & microbe 14, 683-695. Kurylo, C.M., Parks, M.M., Juette, M.F., Zinshteyn, B., Altman, R.B., Thibado, J.K., Vincent, C.T., and Blanchard, S.C. (2018). Endogenous rRNA Sequence Variation Can Regulate Stress Response Gene Expression and Phenotype. Cell Rep 25, 236-+.
26 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Lenz, D.H., Miller, M.B., Zhu, J., Kulkarni, R.V., and Bassler, B.L. (2005). CsrA and three redundant small RNAs regulate quorum sensing in Vibrio cholerae. Molecular microbiology 58, 1186-1202. Liu, M.Y., Gui, G., Wei, B., Preston, J.F., 3rd, Oakford, L., Yuksel, U., Giedroc, D.P., and Romeo, T. (1997). The RNA molecule CsrB binds to the global regulatory protein CsrA and antagonizes its activity in Escherichia coli. The Journal of biological chemistry 272, 17502-17510. Loh, J.T., Torres, V.J., and Cover, T.L. (2007). Regulation of Helicobacter pylori cagA expression in response to salt. Cancer Res 67, 4709-4715. Lozada-Chavez, I., Janga, S.C., and Collado-Vides, J. (2006). Bacterial regulatory networks are extremely flexible in evolution. Nucleic Acids Res 34, 3434-3445. Lukacs, G.L., Rotstein, O.D., and Grinstein, S. (1991). Determinants of the phagosomal pH in macrophages. In situ assessment of vacuolar H(+)-ATPase activity, counterion conductance, and H+ "leak". The Journal of biological chemistry 266, 24540-24548. Lund, P., Tramonti, A., and De Biase, D. (2014). Coping with low pH: molecular strategies in neutralophilic bacteria. FEMS Microbiol Rev 38, 1091-1125. Malachowa, N., Kobayashi, S.D., Sturdevant, D.E., Scott, D.P., and DeLeo, F.R. (2015). Insights into the Staphylococcus aureus-host interface: global changes in host and pathogen gene expression in a rabbit skin infection model. PloS one 10, e0117713. Mandlik, A., Livny, J., Robins, W.P., Ritchie, J.M., Mekalanos, J.J., and Waldor, M.K. (2011). RNA-Seq-based monitoring of infection-linked changes in Vibrio cholerae gene expression. Cell Host Microbe 10, 165-174. Metsalu, T., and Vilo, J. (2015). ClustVis: a web tool for visualizing clustering of multivariate data using Principal Component Analysis and heatmap. Nucleic Acids Res 43, W566-570. Miravet-Verde, S., Llorens-Rico, V., and Serrano, L. (2017). Alternative transcriptional regulation in genome-reduced bacteria. Current opinion in microbiology 39, 89-95. Moore, S.D., and Sauer, R.T. (2005). Ribosome rescue: tmRNA tagging activity and capacity in Escherichia coli. Molecular microbiology 58, 456-466. Munavar, H., Zhou, Y., and Gottesman, S. (2005). Analysis of the Escherichia coli Alp phenotype: heat shock induction in ssrA mutants. Journal of bacteriology 187, 4739- 4751. Nuss, A.M., Beckstette, M., Pimenova, M., Schmuhl, C., Opitz, W., Pisano, F., Heroven, A.K., and Dersch, P. (2017). Tissue dual RNA-seq allows fast discovery of infection-specific functions and riboregulators shaping host-pathogen transcriptomes. Proc Natl Acad Sci U S A 114, E791-E800. Okan, N.A., Bliska, J.B., and Karzai, A.W. (2006). A Role for the SmpB-SsrA system in Yersinia pseudotuberculosis pathogenesis. PLoS pathogens 2, e6. Perez-Rueda, E., Janga, S.C., and Martinez-Antonio, A. (2009). Scaling relationship in the gene content of transcriptional machinery in bacteria. Mol Biosyst 5, 1494-1501. Prosseda, G., Falconi, M., Giangrossi, M., Gualerzi, C.O., Micheli, G., and Colonna, B. (2004). The virF promoter in Shigella: more than just a curved DNA stretch. Molecular microbiology 51, 523-537. Robinson, M.D., and Oshlack, A. (2010). A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol 11, R25. Rodionov, D.A. (2007). Comparative genomic reconstruction of transcriptional regulatory networks in bacteria. Chem Rev 107, 3467-3497.
27 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Russo, P.S.T., Ferreira, G.R., Cardozo, L.E., Burger, M.C., Arias-Carrasco, R., Maruyama, S.R., Hirata, T.D.C., Lima, D.S., Passos, F.M., Fukutani, K.F., et al. (2018). CEMiTool: a Bioconductor package for performing comprehensive modular co- expression analyses. BMC bioinformatics 19, 56. Shishkin, A.A., Giannoukos, G., Kucukural, A., Ciulla, D., Busby, M., Surka, C., Chen, J., Bhattacharyya, R.P., Rudy, R.F., Patel, M.M., et al. (2015). Simultaneous generation of many RNA-seq libraries in a single reaction. Nature methods 12, 323- 325. Sigurlasdottir, S., Engman, J., Eriksson, O.S., Saroj, S.D., Zguna, N., Lloris-Garcera, P., Ilag, L.L., and Jonsson, A.B. (2017). Host cell-derived lactate functions as an effector molecule in Neisseria meningitidis microcolony dispersal. PLoS Pathog 13, e1006251. Svetlanov, A., Puri, N., Mena, P., Koller, A., and Karzai, A.W. (2012). Francisella tularensis tmRNA system mutants are vulnerable to stress, avirulent in mice, and provide effective immune protection. Molecular microbiology 85, 122-141. Szafranska, A.K., Oxley, A.P., Chaves-Moreno, D., Horst, S.A., Rosslenbroich, S., Peters, G., Goldmann, O., Rohde, M., Sinha, B., Pieper, D.H., et al. (2014). High- resolution transcriptomic analysis of the adaptive response of Staphylococcus aureus during acute and chronic phases of osteomyelitis. mBio 5. Wagner, G.P., Kin, K., and Lynch, V.J. (2013). A model based criterion for gene expression calls using RNA-seq data. Theory Biosci 132, 159-164. Wattam, A.R., Abraham, D., Dalay, O., Disz, T.L., Driscoll, T., Gabbard, J.L., Gillespie, J.J., Gough, R., Hix, D., Kenyon, R., et al. (2014). PATRIC, the bacterial bioinformatics database and analysis resource. Nucleic Acids Res 42, D581-591. Wattam, A.R., Davis, J.J., Assaf, R., Boisvert, S., Brettin, T., Bun, C., Conrad, N., Dietrich, E.M., Disz, T., Gabbard, J.L., et al. (2017). Improvements to PATRIC, the all- bacterial Bioinformatics Database and Analysis Resource Center. Nucleic Acids Res 45, D535-D542.
28 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Table 1. USRs known to be target for antimicrobials Gene name Conditions Superclass Subsystem name alr (Caceres et al., 1997) Hyp Nd Oss Oxs Sp Vic Metabolism Amino acid racemase fabF (Young et al., 2006) Li Hyp Nd Ns Oss Oxs Sp Fatty acid synthesis Tm Vic fabG (Heath and Rock, 2004) As Li Hyp Nd Oss Oxs Sp Tm Vic fabI (Heath et al., 1998) Li Hyp Nd Ns Sp Tm folA (Toprak et al., 2011) Li Hyp Nd Ns Oss Oxs Sp Folate biosynthesis Tm Vic folC (Zhang et al., 2015) As Li Hyp Nd Oxs Sp Tm Vic folP (Fermer and Swedberg, As Li Hyp Nd Ns Oxs Sp 1997) Tm Vic panD (Zhang et al., 2013) Li Hyp Nd Oss Oxs Sp Coenzyme A biosynthesis gidB (Okamoto et al., 2007) As Li Hyp Nd Oss Oxs Sp Cellular processes Cell division initiation Tm Vic murA (De Smet et al., 1999) Hyp Nd Ns Oxs Sp Tm Vic Bacterial cell division fusA (Price and Gustafson, As Li Hyp Nd Ns Oxs Sp Protein processing Translation 2001) Tm Vic elongation tuf (Cetin et al., 1996) Li Hyp Nd Ns Oss Oxs Sp Tm Vic ileS (Antonio et al., 2002) As Li Hyp Nd Ns Oss Oxs tRNA aminoacylation Sp Tm Vic rplC (Locke et al., 2009) As Li Hyp Nd Ns Oss Oxs Ribosomal proteins Sp Tm Vic rplF (Norstrom et al., 2007) As Bs Li Hyp Nd Ns Oxs Sp Tm Vic rpsE (Gomez et al., 2017) As Li Hyp Nd Ns Oss Oxs Sp Tm Vic rpsJ (Kubanov et al., 2016) As Li Hyp Nd Ns Oss Oxs Sp Tm Vic rpsL (Traub and Nomura, 1968) As Li Hyp Nd Ns Oss Oxs Sp Tm Vic gyrA (Nakamura et al., 1989) Li Hyp Nd Ns Oxs Sp Vic DNA processing DNA topoisomerases gyrB (Schmutz et al., 2003) Li Hyp Nd Ns Oxs Vic parC (Tankovic et al., 1996) Li Hyp Nd Ns Oxs Sp Tm Vic parE (Nawaz et al., 2015) Hyp Nd Oss Oxs Sp Tm rpoB (Lisitsyn et al., 1984) Li Hyp Nd Ns Oxs Sp Tm RNA processing RNA polymerase Vic rpoC (Friedman et al., 2006) As Li Hyp Nd Ns Sp Tm Vic
29 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Figure Legends
Figure 1. Experimental setup for PATHOgenex RNA atlas
(A) Phylogenetic clustering of bacterial species included in the study. Phylogenetic
orders as well as Gram staining groups and oxygen dependency are indicated by color.
(B) Schematic overview of the 11 infection-related stress conditions and control (Ctrl)
sample not exposed to any stress: acidic stress (As), pH 3-5 for 10 minutes; bile stress
(Bs), 0.5% bile salts for 10 minutes; low iron (Li), 250 μM 2,2-dipyridryl for 10 minutes;
hypoxia (Hyp), low oxygen for 4 hours; nutritional downshift (Nd), incubation in 1X M9
salts for 30 minutes; nitrosative stress (Ns), 250 μM Spermine NONOate for 10
minutes; osmotic stress (Oss), 0.5% NaCl for 10 minutes; oxidative stress (Oxs), 0.5-
10 mM H2O2 for 10 minutes; stationary phase (Sp); temperature (Tm), 41°C for 20
minutes; varied virulence inducing condition (Vic). Some adjustments were made for
certain bacteria (see Table S1).
(C) Schematic illustration of the RNAtag-Seq approach allowing multiple samples (36
in this study) per library used for obtaining 1 122 transcriptomes deposited in the
PATHOgenex RNA atlas.
Figure 2. Niched pathogens have high proportion of constitutively active genes
(A) Diagrams showing; number of genes (gray), percentage of constitutively active
genes (red), percentage of genes active in certain (one to 11) conditions, and
percentage of constitutively silent genes. Genes with TPM≥10 are designated as
active, and TPM<10 as silent. Bacterial species are sorted according to the number of
CDSs in their reference genome.
(B) Scatter plot showing percentage of constitutively active genes for each species.
30 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
(C) Scatter plot showing the number genes annotated as transcriptional regulator per
CDS for each species. Blue trend lines in (B) and (C) show the degree of linear
regression with a 95% confidence interval.
(D) Hierarchical clustering of all strains based on global gene expression profiles in
(A). The 2 main clusters C1 (red) and C2 (blue) are indicated.
(E) Dot plot showing the number of CDSs in C1 (red) and C2 (blue),. The significance
between the two groups was calculated using unpaired t-test. ****P < 0.0001.
Figure 3. Cross-Microbial Human Pathogens Dynamically Respond to Changes
in the Environment
(A) Proportion of genes differentially regulated in at least one condition for each
bacteria. The linked heat maps show proportions of regulated genes for each stress
condition. The scales to the right show zero (white) to highest (green) proportion for
each species. Heat map squares marked with a cross indicate that RNA-seq was not
performed for that specific condition.
(B) Dot plot showing the percentages of genes differentially regulated in each condition
for all included species. The bar plot show the mean value for each condition.
(C) Dot plot showing percentages of differentially regulated genes in each condition for
Gram-negative (gray) and -positive bacteria (orange). The bar plots show the mean
value for each condition. The significance between the groups was calculated with
Multiple t-test using Holm-Sidak method. * indicates adjusted p-value < 0.05.
Figure 4. Transformation of differential expression to PTDEX scores enables
comprehensive understanding of gene regulation under stress conditions.
(A) Schematic illustration of clustering of 105 088 genes based on functional orthology
(KO) and isofunctional homology (PGFam) and resulting KO and PGFam gene groups.
(B) Equation used to calculate PTDEX scores for KO and PGFam gene groups.
31 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
(C) Illustration of the transformation of differential expression values of genes from
many pathogens that are clustered in one gene group into a PTDEX score for each
stress condition. Eight gene groups with highest PTDEX scores in nitrosative stress
condition are shown as example. White rectangles indicate no differential expression.
(D) The 20 KO and PGFam groups with highest PTDEX scores in low iron (top) and
oxidative stress (bottom) conditions. The size of gray dots after the gene name relates
to the number of genes in the gene groups.
Figure 5. Gram-negative and -positive bacteria exhibit distinct responses.
(A) Venn diagram of individual and shared PGFam groups with a PTDEX score >0 in
at least one of the stress conditions.
(B) Heat map showing degree of similarity between responses to different stresses in
Gram-negative and -positive bacteria. Similarity was calculated using the Pearson
correlation coefficient of PGFam groups’ PTDEX scores in each condition.
(C) Heat map showing degree of similarity between responses to different stress
conditions in Gram-negative and -positive bacteria. Similarities were calculated as in
(B) and the similarity distances shown in a dendrogram.
(D) Modules generated by CemiTool showing conditions that involve similar regulation
of gene groups with high PTDEX scores in Gram-negative and -positive bacteria. n
indicates number of PGFam groups within each module. The most enriched KEGG
pathways/processes for each module are shown with the number of gene groups
indicated in brackets. See also Table S5 and S6.
Figure 6. Complex stress conditions commonly involve differential expression
from non coding regions, including tmRNA and novel non-coding RNAs.
32 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
(A) Ratio of percent of reads mapped to non-CDSs under each stress condition in
comparison to control for all tested strains. Different stress conditions are indicated by
color. The gray line indicates Stress/Control value equal to one.
(B) Proportion of reads mapped to tmRNA sequences of total number of mapped reads
under each stress condition for all tested strains.
(C) Proportion of reads mapped to tmRNA sequences of total number of reads mapped
to non-CDSs under each stress condition for all tested strains.
(D) Proportion of reads mapped to the K. pneumoniae CsrB and to the region covering
KPN_01149 CDS with 5’-UTR and 3’-UTR sequences under each stress condition.
(E) Number and aligments of reads mapped to KPN_01149 and its UTRs in stationary
phase. Green indicates forward reads, red indicates reverse reads, and blue indicates
paired reads. The vertical dashed line indicates the position of the TSS of KPN_01149
(Kim et al., 2012).
(F) Proportion of reads mapped to the 1230-nt intergenic region in S. aureus MRSA
and MSSA genomes under each stress condition.
(G) Number of reads mapping to the 1230-nt region in S. aureus MRSA and MSSA
strains in stationary phase. Colors represent reads as noted in (E).
Different stress conditions are indicated with colored dots on the right.
The means of three replicates in each condition were used for the calculation in (A),
(B), (C), (D), and (F).
Figure 7. PATHOgenex RNA atlas provides opportunities for retrival of gene
expression data including differential expression values, PTDEX scores and
gene expression values.
(A) Differential expression of the three T6SS operons in P. aeruginosa PAO1 under 11
stress conditions.
33 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
(B) Top 5 PGFam groups with highest PTDEX scores in virulence inducing condition.
(C) Differential expression of adhE in all strains carrying the gene.
(D) Expression levels (TPM) of adhE across all conditions in three strains
representative of Gram-negative and -positive strains.
The differential expression in (A) and (C) are shown as log2 fold change with FDR-
corrected p-value < 0.05.
Figure S1. Stress exposure and experimental setup. Related to Figure 1. Three
bacterial cultures were inoculated from bacteria grown in the stationary phase for the
stress exposure experiments. Cultures, except control samples, were exposed to the
stress conditions at exponential growth (OD600 0.1-0.5). Transcription was stopped by
adding 0.05% (final concentration) phenol:ethanol solution. Cells were pelleted and
homogenized in Trizol solution and stored at -80 oC until RNA extraction step. The
libraries were generated with RNAtag-Seq method from total RNA. The specific
culturing and stress conditions for each species are indicated in Table S1. L.
pneumophila and M. tuberculosis were not exposed to the virulence inducing condition,
no such condition has been described.
Figure S2. The global expression pattern of biological replicates from the same
sample cluster. Related to Figure 2. Correlation coefficients (R) of the global
expression levels were plotted on a matrix and a dendogram showing the clustering of
replicates with complete linkage (below) for each species. R values were calculated by
the Pearson method. The matrices were generated using the “corrplot” R package and
complete linkage by the CLC Genomics Workbench.
34 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Figure S3. Differential expression of genes previously known to be regulated
under particular stress conditions and percentage of differentially expressed
genes in different groups of bacteria. Related to Figure 3.
(A) Differential expression of; bfd encoding bacterioferritin-associated ferredoxin, in
low iron conditions, fnr encoding fumarate and nitrate reductase under hypoxia, cstA
encoding carbon starvation protein under nutritional downshift, groL encoding heat
shock protein under high temperature conditions, dps encoding DNA protection during
starvation in the stationary phase, ahpC encoding alkyl hydroperoxide reductase C
under oxidative stress conditions, proW encoding glycine betaine/proline transport
system permease protein under osmotic stress conditions, hmp encoding
flavohemoprotein, which is involved in NO detoxification, under nitrosative stress
conditions.
(B) Aerobic (gray) and microaerophilic bacteria (orange).
(C) Bacilli (gray), Betaproteobacteria (orange), and Gammaproteobacteria (red).
(D) C1 bacteria (gray) and C2 bacteria (orange) identified in Figure 2D.
Error bars indicate standard deviation of the mean. The significance between the
groups was calculated with Multiple t-test using Holm-Sidak method. ** indicates
adjusted p-value < 0.01.
Figure S4. Diverse transcription factors from Gram- negative and Gram- positive
bacteria shows disctinct PTDEX score patterns in bile stresses. Related to
Figure 5. PTDEX scores of sigma factors, response regulators, and one-component
systems in both Gram-negative and -positive strains. Transcription factors in the
different bacteria were identified using the P2TF database (Ortet et al., 2012).
35 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Figure S5. PTDEX score ≥0.25 is highly accurate for indicating the probability of
gene groups being regulated under a particular stress condition. Related to
Table 1. Randomly picked three PGFam groups from Gram-negative strains with
PTDEX scores between 0.25-0.3 and different numbers of genes included for each
stress condition to show how the PTDEX score indicates the differential expression of
the gene groups. Each box represents an individual gene from the gene group. Red
indicates downregulation, blue indicates upregulation, white indicates not regulated,
and crossed white indicates a gene for which expression could not be measured due
to lack of RNA.
Figure S6. PTDEX scores identify universal stress responder genes involved in
responses to multiple stresses in both Gram-negative and -positive strains.
Related to Table 1. USRs were clustered in different biological processes using the
PATRIC Subsystems tool and are shown with their general PTDEX scores over the 11
stress conditions.
Figure S7. Sequence homology of KPN_01149 paralogs in K. pneumoniae and
expression of the intergenic region in S. aureus from in vivo samples. Related
to Figure 6.
(A) Multiple sequence alignments of KPN_01149 and its paralogs in K. pneumoniae.
Sequences marked with blue indicate the 5’-UTR and 3’-UTR. Red-colored nucleotides
indicate identical nucleotides among the three sequences.
(B) Reads mapped to S. aureus 6850 intergenic region described in Figure 6G during
acute infection and chronic infection in a murine infection model, and in vitro
exponential growth (Szafranka et al., 2014).
36 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Figure 1
A Borreliella burgdorferi Mycobacterium tuberculosis Listeria monocytogenes Staphylococcus epidermidis Staphylococcus aureus (MRSA) Staphylococcus aureus (MSSA) Enterococcus faecalis Streptococcus pyogenes Streptococcus agalactiae Streptococcus suis Streptococcus pneumoniae Campylobacter jejuni Helicobacter pylori (G27) Helicobacter pylori (J99) Achromobacter xylosoxidans Burkholderia pseudomallei Neisseria meningitidis Neisseria gonorrhoeae Vibrio cholerae Francisella tularensis Legionella pneumophila Aggregatibacter actinomycet. Haemophilus influenzae Pseudomonas aeruginosa Acinetobacter baumannii Yersinia pseudotuberculosis Klebsiella pneumoniae Salmonella enterica Typhimurium Shigella flexneri Escherichia coli (ETEC) Escherichia coli (UPEC) Escherichia coli (EPEC)
Spirochaetales Gram (+) Corynebacteriales Bacilli Gram (-) Epsilonproteobacteria Diderm Betaproteobacteria Aerobic growth Gammaproteobacteria Microaerophilic growth
B Stress conditions Ctrl As Bs Li Hyp Nd Ns Oxs Oss Sp Tm Vic
Total vs. x3 RNA
C RNAtag-Seq
Total Barcoded RNA total RNA Pooled 36 barcoded rRNA depletion RNA samples Library preparation
Sequencing 1122 libraries bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Figure 2
A Number Constitutively Active in Constitutively D of genes active certain conditions silent A. xylosoxidans B. burgdorferi P. aeruginosa H.pylori (J99) E. coli (ETEC) H. pylori (G27) S. enterica Typhimurium C. jejuni E. coli (EPEC) S. pyogenes E. coli (UPEC) H. influenzae K. pneumoniae
N. gonorrhoeae C1 F. tularensis A. baumannii S. suis B. pseudomallei S. pneumoniae L. monocytogenes S. agalactiae E. faecalis A. actinomyc. S. flexneri S. epidermidis V. cholerae N. meningitidis Y. pseudotuberculosis E. faecalis S. pneumoniae S. aureus (MSSA) S. aureus (MRSA) L. monocytogenes S. aureus (MSSA) S. aureus (MRSA) L. pneumophila V. cholerae M. tuberculosis A. baumannii A. actinomyc. M. tuberculosis S. suis S. flexneri S. epidermidis Y. pseudotuberculosis S. pyogenes E. coli (UPEC) S. agalactiae S. enterica Typhimurium
C2 H. influenzae E. coli (EPEC) N. meningitidis E. coli (ETEC) F. tularensis P. aeruginosa L. pneumophila K. pneumoniae C. jejuni B. pseudomallei N. gonorrhoeae A. xylosoxidans H. pylori (G27) 0 0 0 0 20 40 60 80 20 40 60 80 20 40 60 80 H. pylori (J99) 100 100 100 2000 4000 6000 B. burgdorferi Number of genes % of constitutively active genes % of genes active in certain conditions % of constitutively silent genes
8000 0.08 B M. tuberculosis C E **** 80 0.06 6000 P. aeruginosa
0.04 4000 60
0.02 2000 Constitutively regulator/CDS
40 Transcriptional active genes(%) Number of CDSs 0.00 0 2000 3000 4000 5000 6000 2000 3000 4000 5000 6000 C1 C2 Number of CDSs Number of CDSs bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Figure 3
A As Bs Li Hyp Nd Ns Oss Oxs Sp Tm Vic A. xylosoxidans 0 33 A. baumannii 0 46 A. actinomyc. 0 46 B. burgdorferi 0 43 B. pseudomallei 0 57 C. jejuni 0 29 E. faecalis 0 50 E. coli (EPEC) 0 38 E. coli (ETEC) 0 42 E. coli (UPEC) 0 44 F. tularensis 0 54 H. influenzae 0 56 H. pylori (G27) 0 57 H. pylori (J99) 0 56 K. pneumoniae 0 36 L. pneumophila 0 64 L. monocytogenes 0 63 M. tuberculosis 0 26 N. gonorrhoeae 0 31 N. meningitidis 0 44 P. aeruginosa 0 53 S. enterica Typhimurium 0 43 S. flexneri 0 46 S. aureus (MRSA) 0 48 S. aureus (MSSA) 0 47 S. epidermidis 0 56 S. agalactiae 0 42 S. pneumoniae 0 68 S. pyogenes 0 59 S. suis 0 39 Vibrio cholerae 0 51 Y. pseudotuberculosis 0 45 0 20 40 60 80 100 Differentially expressed genes (%)
B C Gram (-) 60 60 Gram (+) ** 40 40
20 20 % of regulated genes 0 % of regulated gene s 0 Li Li As Bs Ns Sp As Bs Nd Ns Sp Nd Vic Vic Tm Tm Hyp Hyp Oss Oxs Oss Oxs bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Figure 4
A Functional D Low iron 63831 6054 Orthologs genes KO groups (KOs) Size As Bs Li Hyp Nd Ns Oss Oxs Sp Tm Vic As Bs Li Hyp Nd Ns Oss Oxs Sp Tm Vic 105088 Size 5 genes bfd ibpA 25 Isofunctional 97565 27999 mntC clpB 50 homologues ABC.FEV.P bfd 75 genes PGfam groups (PGfams) ABC.FEV.S sitB 100 mntB pspA 150 ibpA sitC Size (n genes) ABC.FEV.A bfr B exbD ahpC 8 n on n on exbB adhE genes ( ) • species ( ) •log 1+ n on 2 ()genes ( ) clpB dnak 6 ngenes total n total •n database tolR htpG ( ) strain ( ) strain ( ) tonB ibpB psaA, scaA sufA cspA hspQ 4 psaC, scaB tonB C TC.FEV.OM2 exbD As Bs Li Hyp Nd Ns Oss Oxs Sp Tm Vic Gene groups mtsA groES 2 TC.FEV.OM feoB groES yhcN
bfr nuoA PTDEX score 0 PTDEX score calculation KO groups PGfam groups hmpA for each condition
Oxidative stress nrdG As Bs Li Hyp Nd Ns Oss Oxs Sp Tm Vic As Bs Li Hyp Nd Ns Oss Oxs Sp Tm Vic Size Size 5 lexA lexA 25 As Bs Li Hyp Nd Ns Oss Oxs Sp Tm Vic iscA ahpC 50 sodB ahpF 75 ahpC sodA 100 dinI mntB 150 Size (n genes) nrdD fdoG, fdhF nrfA iscR nrdD dinI katG 8 ABC.FEV.A ibpA yjjAraiA cspA adhE nfuA 6 Individual genes in different strains umuC PTDEX Scores rtpR sufA exbD 0 2 4 6 8 mntC nrfA nrfD 4 sdhA, frdA cspE grcA ahpF nrfB ABC.FEV.S nrfC 2 ABC.FEV.P pflB ibpAibpB sdhD, frdD iscS
ibpA iscA PTDEX score 0 KO groups PGFam groups (log2 FoldChange) 05 -5 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Figure 5
A B Gram-negatives As Bs Li Hyp Nd Ns Oss Oxs Sp Tm Vic
As Bs Gram-positives Li Hyp Gram-positives 5981 1124 1779 Nd Ns Oss Oxs Gram-negatives Sp Tm Vic
Pearson’s r 1 0
Gram-negatives C D Gram-negatives KEGG Pathways/Proceesses KEGG Pathways/Proceesses 0 Biosynthesis of sec. metabolites (143) Biosynthesis of sec. metabolites (4) Bs 3 N-module 1 2 N-module 2 (n=768) Carbon metabolism (68) (n=61) Purine-Pyrimidine met. (4) As Ribosome (50) 2 Carbon fixation (2) Oss Biosynthesis of amino acids (42) Biosynthesis of amino acids (2) Two-component system (35) 1 Li Li Nd Oxs Cell cycle (2) 1 Hyp Sp ABC transporters (30) Ns Selenocompound met. (2) Oxs PTDEX score Oxidative phosphorylation (26) Ns 0 Quorum sensing (24) 0 Thiamine met. (1) Hyp 1 2 N-module 3 Glyoxylate and dicarboxylate met. (3) 2 N-module 4 Tryptophan met. (3) Nd (n=35) Biosynthesis of sec. metabolites (3) (n=41) Oxidative phosphorylation (2) Retinol met. (1) Glyoxylate and dicarboxylate met. (2) Sp Ns Oxs Vic 1 Pyruvate met. (1) 1 Longevity regulating pathway (2) Hyp Sp Pentose and glucuronate intercon. (1) Hyp Two-component system (2)
Tm PTDEX score Pearson’s r Pearson’s Fatty acid degradation (1) Sulfur relay system (1)
Li Tyrosine met. (1) Base excision repair (1) Bs As Ns Sp Nd 0 Vic 0 Tm Oss Oxs Hyp N-module 5 Oxidative phosphorylation (7) N-module 6 Valine, leucine and isoleucine biosyn (1) (n=30) Biosyn. of secondary metabolites (5) (n=29) Protein export (1) Oss Methane met. (4) 1 Glycerophospholipid met. (1) 1 Hyp Tm Glyoxylate and dicarboxylate met. (3) Biosynthesis of amino acids (1) Purine met. (2) Starch and sucrose met. (1) Gram-positives PTDEX score Porphyrin and chlorophyll met. (2) PTDEX score PTDEX score PTDEX score Quorum sensing (1) LPS biosynthesis (1) 2-Oxocarboxylic acid met. (1) Li 0 0 0 As Gram-positives Bs Oxs KEGG Pathways/Proceesses KEGG Pathways/Proceesses Ns 3 P-module 1 Ribosome (48) P-module 2 Biosynthesis of amino acids (8) (n=224) ABC transporters (20) (n=60) ABC transporters (6) Oss 2 Biosynthesis of amino acids (18) 2 Butanoate met. (5) Hyp Sp Nd Nd 1 Two-component system (16) Purine metabolism (4) 1 Glycolysis (14) 1 Phosphotransferase system (4) Hyp Tm Sp PTDEX score Purine-Pyrimidine met. (13) PTDEX score Fatty acid metabolism (3) 0 Quorum sensing (12) Glyoxylate and dicarboxylate met. (3) Tm 0 Vic P-module 3 ABC transporters (4) Pearson’s r Pearson’s (n=27) Degradation of aromatic compounds (3) Li
As Bs 2 Ns Sp Nd Vic Tm Glycine, serine and threonine met. (3) Oxs Oss Hyp Hyp Retinol met. (2) 1 Cysteine and methionine met. (2)
PTDEX score Glycolysis (2) 0 Pyrimidine met. (2) bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Figure 6
A
A. xylosoxidansA. baumanniiA. actinomyc.B. burgdorferiB. pseudomalleiC. jejuniE. faecalisE. coliE. (EPEC) coli E.(ETEC) coli F.(UPEC) tularensisH. influenzaeH. pyloriH. pylori(G27)K. (J99)pneumoniaeL. pneumophilaL. monocytogenesM. tuberculosisN. gonorrhoeaeN. meningitidisP. aeruginosaS. entericaS. flexneri TyphimuriumS. aureusS. aureus (MRSA)S. epidermidis (MSSA)S. agalactiaeS. pneumoniaeS. pyogenesS. suisV. choleraeY. pseudotuberculosis 3.5
2.8
2.1
1.4
0.7
Stress/Control As Bs 0 B 50 Li Hyp 40 Nd 30 Ns Oss 20 Oxs
% tmRNA of total % tmRNA 10 Sp Tm 0 Vic C 80 70 Ctrl 60 50 40 30 20 10 0 % tmRNA of non-coding % tmRNA 50 50 1304465 KPN_01149 1304782 D E F G 6 SAR2467 SAR2468 5 40 40 4 3 1.4 2 30 30 1 MRSA 0 0.7 20 20 1.8 SAS2270 SAS2271 1.5 1.2 10 0 10 Transcriptional 0.9 % reads mapped % reads mapped Mapped reads (M) start site 0.6 0 0 Mapped reads(M) 0.3 MSSA 0
CsrB MRSAMSSA
KPN_01149 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Figure 7
A B As Bs Ns Sp Tm Li Hyp Nd Oss Oxs Vic PA0077 PA0078 PA0079 PA0080
PA0081 As Bs Li Hyp Nd Ns Oss Oxs Sp Tm Vic PA0082 PGF_09084129 adhE PA0083 PGF_06077968 ftn PA0084 PGF_00644224 manZ PA0085 T6SS-1 PGF_10382119 malK PA0086 PGF_00027674 osmY PA0087 PA0088 PTDEX score PA0089 0 2 4 6 8 PA0090 P. aureginosa P. PA0091 PA0092 PA0093 PA0094 PA0095 PA1656 PA1657 PA1658 PA1659 C As PA1660 adhE Bs Li Hyp Nd Ns Oss Oxs Sp Tm Vic T6SS-2 PA1661 A. actinomycetemcomitans PA1662 E. coli (UPEC) PA1664 E. coli (EPEC) PA1665 E. coli (ETEC) PA1666 K. pneumoniae P. aureginosa P. PA1667 S. flexneri PA1668 S. enterica Typhimurium PA1669 V. cholerae Gram-negatives PA2360 Y. pseudotuberculosis PA2361 E. faecalis PA2362 L. monocytogenes PA2363 S. epidermidis PA2364 S. aureus (MSSA) T6SS-3 PA2365 S. aureus (MRSA) PA2366 S. agalactiae PA2367 S. pneumoniae Gram-positives PA2368 S. pyogenes PA2369 S. suis P. aureginosa P. PA2370 PA2371 FoldChange (Log2) PA2373 -15 -5 -2 -1 -0.4 0.4 1 2 5 15 FoldChange (Log2)
-15 -5 -2 -1 -0.4 0.4 1 2 5 15
E. coli (EPEC) adhE K. pneumoniae adhE S. enterica Typhimurium adhE D 8000 10000 2000 8000 6000 1500 6000 4000 1000 TP M TP M TP M 4000 2000 500 2000
0 0 0 Li Li Li As Bs As Bs As Bs Ns Sp Ns Ns Sp Sp Nd Nd Nd Vic Vic Vic Tm Tm Tm Ctrl Ctrl Ctrl Hyp Oss Oxs Oss Oxs Oss Oxs Hyp Hyp
30000 L. monocytogenes adhE 5000 S. aureus (MRSA) adhE 6000 S. epidermidis adhE
4000 20000 4000 3000 TP M TP M TP M 2000 10000 2000 1000
0 0 0 Li Li Li As As Bs As Bs Bs Ns Ns Sp Ns Sp Sp Nd Nd Nd Vic Vic Vic Tm Tm Tm Ctrl Ctrl Ctrl Oss Oxs Oss Oxs Hyp Hyp Hyp Oss Oxs