bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

RNA Atlas of Human Bacterial Pathogens Uncovers Stress Dynamics Linked to Infection

Kemal Avican1*, Jehad Aldahdooh2,3, Matteo Togninalli4,5, Jing Tang2,3, Karsten M. Borgwardt4,5, Mikael Rhen6 and Maria Fällman1,7*

1Department of Molecular Biology, Laboratory for Molecular Infection Medicine Sweden (MIMS), Umeå Centre for Microbial research (UCMR), Umeå University, Umeå, 90187, Sweden. 2Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, 00014, Finland. 3Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki, 00014, Finland 4Department for Biosystems Science and Engineering, ETH Zürich, Basel, 4058, Switzerland. 5Swiss Institute for Bioinformatics, Switzerland. 6Department of Microbiology, Tumor and Cell Biology (MTC), Karolinska Institute, Stockholm, 17165, Sweden 7Lead contact *Correspondence : [email protected] (K.A.), [email protected] (M.F.)

Abstract

Despite being genetically diverse, bacterial pathogens can adapt to similar stressful

environments in human host, but how this diversity allows them to achive this is yet not

fully understood. Knowledge gained through comparative genomics is insufficient as it

lacks the level of gene expression reflecting gene usage. To fill this gap, we

investigated the transcriptome of 32 diverse bacterial pathogens under 11 host related

stress conditions. We revealed that diverse bacterial pathogens have common

responses to similar stresses to a certain extent but mostly employ their unique

repertoire with intersections between different stress responses. We also identified

universal stress responders which shed light on the nature of antimicrobial targets. In

addition, we found that known and unknown putative novel ncRNAs comprised a

significant proportion of the responses. All the data is collected in PATHOgenex atlas,

1 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

providing ample opportunities to discover novel players critical for virulence and

maintenance of infection.

Introduction

Bacterial pathogens with different genetic and physiological features share the

capacity to sense and respond to external changes in the host via regulation of their

global transcriptome. These responses, commonly called stress responses, are critical

for pathogen invasiveness and protection from host immunity, which makes them

topics of interest for future development of new antimicrobials. The responses are often

complex, and synergistic regulation of regulatory networks can be pivotal in sensing

and adapting to environments in different colonization niches during different phases

of infection (Avican et al., 2015; Heine et al., 2018; Malachowa et al., 2015; Mandlik et

al., 2011; Nuss et al., 2017). Infecting encounter numerous stress conditions

which vary depending on the route and phase of infection. For many pathogens, one

of the first changes upon infection of mammalian hosts is altered temperature.

Consequently, bacterial virulence factors, such as type III secretion systems in Yersinia

spp. and spp., are positively regulated by temperature (Bolin et al., 1988;

Prosseda et al., 2004). Low pH in the gastrointestinal tract (GIT), genital tract, dental

plaque, and skin, and acid exposure in phagosomes represents additional stressful

environments that have to adapt to (Lukacs et al., 1991; Lund et

al., 2014). Another agent commonly affecting enteric bacteria are bile salts produced

in the liver and secreted into the GIT, as well as secondary bile salts produced by the

microbial flora (Hofmann and Hagey, 2008). The hyperosmotic nature of the blood

stream and GIT can also be harsh for certain pathogens, inducing the expression of

virulence genes in some cases, such as in and

(Ishikawa et al., 2012; Loh et al., 2007). Limited nutrient, oxygen, and iron levels in the

2 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

host are other stresses that most pathogens encounter and have to adapt to for survival

(Fang et al., 2016). Furthermore, many invading bacteria encounter and

macrophages, which produce toxic reactive oxygen and nitrogen species.

Bacterial pathogens are diverse regarding gene content and regulatory networks, but

yet equally successful to adapt to these host stresses. Comparative genomics studies

have enabled accumulated knowledge of the diversity of bacterial pathogens.

However, the question, how and when different gene products are employed by

diverse pathogens to cope with same stresses encountered in human host remains to

be answered. To complement comparative genomics studies and answer those

questions, global gene expression profiling of diverse bacterial pathogens under host

related conditions is desired. Even though there are large amount of gene expression

data available for many bacterial pathogens, the collection of such dataset from

different sources is not optimal to complement comparative genomics due to

differences in sample handling, experimental methodologies, and incomplete datasets.

In this study, we analyzed the global expression profile of 32 different bacterial

pathogens, representing 28 different species, under 11 infection-relevant stress

conditions with same experimental set-up and methodology. We assessed expression

profile of 105 088 genes and identified common and specific gene groups to certain

bacterial groups with comparative genomics. All datasets were collected in an

interactive RNA atlas, named PATHOgenex (www.pathogenex.org), freely available to

the research community to interrogate expression profiles of human pathogens. The

datasets were used to uncover similarities and discrepancies in different stress

responses across different groups of bacteria. This was made possible by grouping

genes present in different bacteria according to function and homology and computing

a score showing the probability to be regulated in a particular environment. These

3 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

scores also allowed us to identify universal stress responders, which are genes that

are present in many bacterial species showing altered expression in multiple stress

conditions. This group of genes could potentially serve as a source for studies aiming

to identify novel targets of antimicrobials as it contain many already known targets. We

could also show that non-coding regulatory RNAs can be differentially regulated in

response to stressful environments and that novel small and long non-coding RNAs

can be explored from the dataset.

Results and Discussion

Consensus In Vivo Stress Conditions

The majority of bacteria chosen for the PATHOgenex RNA atlas represent pathogens

causing worldwide health problems. Most of the strains are commonly used in the

microbial research community and diverse in terms of Gram staining, phylogeny, and

oxygen requirement, which should facilitate interpretation and use of data (Figure 1A).

We designed 10 experimental setups mimicking different infection-relevant stress

conditions expected to be encountered by bacterial pathogens in the human host:

acidic stress, bile stress, low iron, hypoxia, nutritional downshift, nitrosative stress,

osmotic stress, oxidative stress, high temperature and starvation (stationary phase). A

very comprehensive suite of infection relevant conditions were previously described

for serovar Typhimurium (Kroger et al., 2013). We used similar

conditions that are applicable to diverse human bacterial pathogens. We also included

exposure to specific in vitro virulence inducing conditions, such as shifting Yersinia

pseudotuberculosis from 26°C to 37°C and depleting extracellular Ca2+, L-lactate

supplementation for Neisseria spp. (Sigurlasdottir et al., 2017), and presence of

charcoal-like resin XAD-4 in the growth medium for monocytogenes

4 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

(Ermolaeva et al., 2004) (Figure 1B). For some species, such as Mycobacterium

tuberculosis and , virulence inducing conditions are unknown

and were not performed. As control for differential expression analyses, we utilized

unexposed, exponentially grown bacteria. The growth temperature and culture

medium were adjusted for each specific strain to achieve optimal growth conditions.

Similarly, the strength of the stress with the different agents and associated exposure

time was designed to be as similar and relevant as possible for each bacterial species.

However, minor changes were necessary for certain conditions; for example, the “low

pH level” used for H. pylori strains were lower than for the others, and a significantly

higher H2O2 concentration was required to induce a stress response in Acinetobacter

baumannii (Table S1).

Large Scale RNA-seq generates Accurate Datasets

All stress experiments were performed in laboratories with expertise in

growing/studying the different strains (Figure 1A and Figure S1). Three biological

replicates were used for statistical analyses, in total 1 122 rRNA-depleted RNA

samples were prepared using RNAtag-Seq, an approach based on tagging RNA

samples with barcode oligonucleotides, enabling pooling of multiple samples in one

RNA-seq library (Shishkin et al., 2015) (Figure 1C). This strategy is expected to limit

biases during sample preparation. We used a set of 36 previously described barcode

DNA oligonucleotides (Shishkin et al., 2015) in which the same set of three oligos was

applied for each condition for consistency (Key Resources). The sequencing reads

from each library were demultiplexed according to the 8-nt unique barcode reads to

separate the different samples. The number of reads for the different barcoded

samples did not reveal significant variation across conditions and species (Table S2),

excluding bias regarding over-representation of one or a set of barcodes during library

5 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

preparation. We obtained an average of 12.2 million reads for each rRNA-depleted

barcoded library, which far exceeds the 2-3 million reads considered sufficient to

determine differentially expressed genes from bacteria with high significance (Haas et

al., 2012). The average mapping of the reads to the reference genomes was >90%

(Table S2, Key Resources). The lowest coverage (67.3%) was seen for the clinical

isolate of Achromobacter xylosoxidans, likely due to the lack of an appropriate

reference genome. In this case, we used the sequenced A. xylosoxidans genome that

gave the highest number of mapped reads. In general, the read mapping to reference

genomes showed efficient rRNA depletion, with only 0.1-5% reads mapping to rRNA

and tRNAs, except for , two of the H. pylori strains, and Neisseria

gonorrhoeae, which had higher proportions mapping to rRNA (Table S2).

Nevertheless, the library sequencing for these species was deep enough to cover 94%

of the coding sequences (CDSs), with >10 reads per CDS in the least efficiently

depleted library.

Genome Size and Infection Niche Influence Gene Expression

To determine what proportions of genes are active or silent under tested conditions we

calculated the expression levels of annotated CDSs, the transcript per million (TPM)

values, for all strains. Hierarchical clustering of expression profiles clustered replicates

of the same sample together, indicating a robust measurement of gene expression in

all conditions and strains (Figure S2). For gene expression analysis, genes with TPM

≥ 10 were considered to be active and genes with TPM < 10 were considered to be

silent, which is similar to definitions used by others (Kroger et al., 2013; Wagner et al.,

2013). We found that 32-95% of all genes in each of the 32 strains were constitutively

active across the 12 tested conditions. There were also genes only active in certain

conditions (one to 11 conditions), but a set of genes (0.7-20%) were silent in all of the

6 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

conditions (Figure 2A). The silent genes may require more complex environments with

synergistic effects of many stresses, as in real in vivo environments, to be expressed.

It was obvious that bacterial species with relatively smaller genomes had a greater

proportion of active genes in all conditions (Figure 2B). This may indicate conservation

of indispensable genes during the evolutionary shrinkage of the genomes. Reductive

evolution is a well-known event in endosymbionts and pathogenic bacteria with respect

to adaptation in defined niches (Miravet-Verde et al., 2017). This is illustrated by the

larger genomes of ubiquitous environmental species, such as E. coli, and smaller

genomes of niched pathogens, such as Borrelia burgdorferi. A more flexible response

was seen in turning transcription on and off in bacteria with larger genomes, which may

be related to the function and number of regulatory proteins within the species.

Comparative genomics studies of gene content and transcription machinery have

previously demonstrated that the number of transcription factors per open reading

frame increases with genome size in many pathogenic bacteria (Perez-Rueda et al.,

2009). We observed the same pattern within the set of pathogens in this study (Figure

2C). Thus, previous assumptions of less flexibility in turning gene expression on and

off in pathogens with reduced genomes were here verified at the RNA level across

multiple bacterial species.

Most of the bacteria with reduced genomes represent pathogens with restricted niches,

which match the low requirement of dynamic changes in gene expression (Figure 2A).

M. tuberculosis and P. aeruginosa did not follow this trend (Figure 2B), suggesting

that reduced regulation of gene expression is not always due to genome reduction.

Hierarchical clustering based on profiles of constitutively active genes, genes active in

certain conditions, and silent genes generated two bacterial clusters: ubiquitous

environmental species (C1) and niched pathogens (C2) (Figure 2D). Comparing the

7 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

genome sizes of the strains in the two clusters clearly showed that the strains with

restricted infection niches had smaller genomes (Figure 2E). Therefore, our data verify

at the level of RNA expression that bacterial pathogens with restricted niches, mostly

with reduced genomes, have lower plasticity to turn gene expression off.

Bacteria Respond to Stress via Dynamic Changes in Gene Expression

To reveal gene expression patterns associated with responses to different

environmental stimuli we performed differential gene expression analyses for each

strain. Compared to gene expression in unexposed, exponentially grown control

bacteria, most bacteria dynamically responded to the stresses by regulating 62-90%

of their genes in at least one condition (Figure 3A). The highest fraction of regulated

genes was measured under hypoxia, nutritional downshift, and stationary phase

conditions (Figure 3B). The differential expression analysis was verified by checking

for differential expression of known genes that are expected to be regulated under

certain stress conditions. We found that most genes expected to be induced under the

different conditions were upregulated in the species harboring the gene, clearly

showing the accuracy of the data (Figure S3A).

Given that different bacteria have distinct features, we also analyzed the levels of

responses for different groups of bacteria, such as Gram-negative and -positive,

aerobic and microaerophilic, different phylogenic orders (Figure 1A), and the niched

pathogens versus environmental species groups (Figure 2D). The levels of responses

were relatively similar between different groups. For Gram-negative and -positive

strains, however, responses to bile were significantly higher in the latter group, possibly

reflecting a higher magnitude of stress due to having only one plasma membrane

(Figure 3C, Figure S3B, C, and D).

Clustering Genes into Gene Groups for Cross Microbial Comparisons

8 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Comparison of differential expression of genes with similar functions or homologous

genes across different bacterial species is important for a broad and comprehensive

understanding of gene regulation and gene function in terms of adaptation to new

environments. Therefore, we clustered genes from the 32 strains into gene groups

using two different orthology and homology approaches: one based on functional

orthologs using KEGG’s annotation tool GhostKOALA (Kanehisa et al., 2016), and one

based on isofunctional homologs using the PATRIC database with Patric Global Family

(PGFam) (Wattam et al., 2014). KEGG orthology (KO) numbers are assigned to genes

harboring the same function, independent of sequence homology, and can include

genes with different sequences and lengths. For example, KO0123 is assigned to the

genes encoding formate dehydrogenase major subunit, which includes all major

subunits of formate dehydrogenase, including FdhF, FdnG, FdoG, and NapA in our

dataset. On the other hand, PGFam numbers are assigned to genes with the same

function and sequence homology (i.e., isofunctional homologs) as indicated by RAST

and CoreSEED (Davis et al., 2016). Therefore, PGFam groups are more specific in

terms of function and homology, but with fewer genes per group. Consequently, in

contrast to KO groups, formate dehydrogenase subunits were assigned to four

different PGFam groups. For KEGG, the tool assigned 6054 KO numbers to 63 831

genes, representing 60.7% of the total 105 088 genes in the dataset (Figure 4A). For

PGFam, 27 999 PGFam numbers were assigned to 97 565 genes, representing 92.8%

of the total number of genes in the dataset.

Revealing Probability of Gene Groups to be Differentially Expressed

To predict the probability of the genes within the KO or PGFam groups to be regulated

under certain stress conditions, we formulated an equation that computes a stress

condition-specific score indicating differential regulation of genes in the group, defined

9 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

as ‘probability to be differentially expressed (PTDEX) score’. The equation was

employed to gene groups that contain at least 2 genes (5 340 KO groups and 11 353

PGFam groups) (Table S3 and S4). The resulting PTDEX score takes into account

how well conserved the genes in the group are among the 32 pathogens, what

proportion of the genes in the group are differentially regulated under a particular stress

condition, and how well this regulation is preserved among the 32 strains (Figure 4B).

For example, a gene group that is present in all 32 species and differentially expressed

under acidic stress in many of the species will have a high PTDEX score for acidic

conditions. This not only indicates a high probability of the genes in this group being

differentially expressed under acidic stress, but can also hint regulation of homologous

genes from bacterial pathogens not included in this data set. Hence, using PTDEX

scores for comprehensive understanding of gene regulation in bacteria is a novel

approach providing in depth overview of common and specific regulation of same

genes in different species and simplifies cross microbial comparisons (Figure 4C).

To test the power of the resulting PTDEX scores, we ranked the KO and PGFam

groups based on PTDEX scores in low iron and oxidative stress conditions and plotted

the groups representing the highest 20 (Figure 4D and 4E). Both KO and PGFam

PTDEX scores were found to be indicative for regulation under these conditions. Many

of the genes with high PTDEX scores for low iron represented genes encoding proteins

known to be involved in iron uptake and iron homeostasis, such as tonB, exbB, exbD,

mntABC, bfr, bfd, ibpA, ibpB, sufA, feoB, and nuoA (Figure 4D). Similarly, the genes

with highest KO and PGFam PTDEX scores in oxidative stress were genes known to

respond to DNA damage and oxidative stress, such as lexA, soda, sodB, aphC, dinI,

umuC, rtpR, and iscA (Figure 4E). As expected the highest 20 were not exactly the

same for the KO and PGFam groupings, but most of the expected gene groups were

10 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

found in both. The higher PTDEX score observed for PGFam groups is likely due to

more similar regulation of isofunctional homologues than of functional orthologs.

Therefore, to derive more accurate conclusions, we used PGFam PTDEX scores for

the remainder of our analysis.

PTDEX Scores Enable Comparisons Between Stress Responses of Evolutionary

Distinct Bacterial Groups

The gene expression data in PATHOgenex were collected from diverse bacterial

species (Figure 1A) with distinctive gene content and transcriptional regulatory

networks. Common transcriptional regulatory patterns, between for example Gram-

negative and -positive bacteria, are anticipated to be rare. Therefore, we also re-

clustered genes of Gram-negative (21 strains from ,

Betaproteobacteria, and ) and Gram-positive strains (9 strains

from Bacilli) into G- and G+ PGFam groups and then re-computed the PTDEX scores

separately (Table S5 and S6). We then generated a similarity matrix with the Pearson

correlation coefficient for PTDEX scores from 7105 G- PGFam groups representing 47

468 genes and 2903 G+ PGFam groups representing 14 200 genes for each stress

condition. It was here obvious that despite the substantial difference in PGFam groups

(Figure 5A), parts of the responses to external changes were indeed shared by Gram-

negative and -positive bacteria (Figure 5B). The conditions that showed highest

overlaps were hypoxia, nutritional, and stationary phase.

Despite this similarities there is significant differences in responses by Gram-positive

versus Gram-negative strains, which supports previous studies (Rodionov, 2007).

Transcription factors, especially global regulators, are poorly conserved in general, and

transcriptional regulation has been found to be more flexible than their target genes

(Lozada-Chavez et al., 2006). In support of this, an analysis of differential expression

11 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

of transcription factors in response to the different stresses revealed high diversity

between Gram-negative and Gram-positive strains, as expected (Figure S4). Most of

the transcription factors were unique to either Gram-positive or -negative bacteria, and

the few homologs present were mostly regulated differently. Further, the observed

difference in response to bile, where Gram-positive, but not Gram-negative bacteria

responded with expensive differential gene expression (Figure 3C) was also reflected

here, with very few high PTDEX score transcription factors for these conditions in

Gram negatives (Figure S4).

PTDEX Scores Enable Identification of Intersections Between Stress Responses

Reponses to different stressors are expected to partly overlap due to involvement of

the same molecular pathways, as well as functional diversity of gene products. Many

stress responses are for example expected to involve halted growth, enabling

translational adjustments and redirection of resources. To identify overlaps between

different stress responses, the PTDEX scores of G-/G+ PGFam groups were used to

compare responses among the Gram-negative and -positive bacteria separately.

These comparisons indicated a higher degree of overlap between responses in Gram-

negative bacteria, where especially hypoxia, stationary phase, and nutritional

downshift overlapped to a high extent. Furthermore, also responses to low iron,

oxidative and nitrosative stress, as well as responses to acidic and osmotic stress

overlapped (Figure 5C).

To reveal overlapping pathways and processes, we next implemented a co-expression

module identification algorithm (Russo et al., 2018) on G- and G+ PGFam PTDEX

scores. This algorithm generated six modules of PGFam groups for Gram-negatives

showing common patterns of PTDEX scores in certain combinations of stress

conditions (Figure 5D, Table S5). KEGG pathway mapping of the genes in these

12 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

modules indicated overlapping pathways and processes involved in different stress

responses (Figure 5D, N-module 1-6). It was obvious that the response to hypoxia

partly overlapped with responses from all of the other stress conditions (N-module 1

and 3-5), except the response to osmotic stress that seemed to be unique (N-module

6). The largest overlap was seen between responses to hypoxia, nutritional downshift

and stationary phase, reflecting the complexity and high requirement of multiple

pathways for managing these situations. The overlapping expression patterns included

many genes involved in carbon metabolism, protein synthesis, oxidative

phosphorylation and quorum sensing (N-module 1). Responses to low iron, nitrosative,

and oxidative stress also shared PTDEX score patterns (N-module 2). In Gram-positive

strains, the degree of overlap between stress responses was lower (Figure 5C) and

only three modules were generated (Figure 5D, P-module 1-3). The largest overlap

here was between hypoxia and stationary phase (P-module 1), and to a lesser extent

between nutritional downshift and temperature (P-module 2). In contrast to that seen

for Gram-negative strains, some PGFams groups with high PTDEX scores were

specific to hypoxia only (Figure 5D, P-module 3).

PTDEX Scores Reveal Universal Stress Responders Including Known

Antimicrobial Targets

Although the Gram-negative and -positive bacterial pathogens studied here differ in

gene content, the preserved genes may have fundamental roles for bacterial

physiology and adaptation to stressful environments. Participation in multiple stresses

can be key to their conservation during the evolutionary diversification of species. To

retrieve genes encoding these universal stress responders (USRs), we selected the

PGFam groups that were differentially expressed in at least six conditions and

contained at least 11 genes from Gram-negative strains (21 strains in total) and 4

13 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

genes from Gram-positive strains (9 strains in total) (Table S7). We used the general

PGFam PTDEX scores and the threshold for ‘high PTDEX score’ was determined to

be ≥0.25, which indicated the score value where at least 50% of the genes in the group

are differentially expressed (Figure S5). These criteria allowed identification of 421

USRs. Functional clustering of the USRs with PATRIC subsystems (Wattam et al.,

2017) revealed that USRs are involved basic biological processes (Figure S6).

Interestingly, 24 of the identified USRs were genes representing targets for antibiotics,

where mutations in their sequences been shown to confer antibiotic resistance (Table

1). Hence, it is thus likely that novel putative antibiotic targets are to be found among

remaining 397 USRs.

Exploring Stress Related Expression from Non-coding Regions

In addition to differential expression of annotated genes, this data set also provides

information of the complete transcriptional landscapes in different pathogens.

Analysing overall transcription, we surprisingly observed that a substantial proportion

of RNAs were transcribed from non-coding DNA regions, including sRNAs, regulatory

non-coding RNAs, 5’ untranslated regions (UTRs), 3’-UTRs, and intergenic regions

(Table S2). The proportion is expected to be even higher than calculated here since

the majority of sRNAs were excluded during sample preparation due to the >100 nt

cut-off for library preparation. In some bacteria, expression of non-CDSs was higher

than CDSs in hypoxia, nutritional downshift, and stationary phase conditions. Such a

large number of transcripts from non-CDSs in bacteria has not been reported

previously, likely due to studies focusing on CDSs usually only consider annotated

CDSs for mapping reads, and that studies of non-coding RNAs generally use RNA

purification methods dedicated to sRNAs. Noteworthy, for most species, the percent

of transcripts from non-CDS was increased upon stress exposure in comparison to

14 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

control (Figure 6A). This indicates that expression from non-coding regions also

contribute bacterial response to stress conditions.

Stress Responses Includes tmRNA and Various non-coding RNAs

To reveal the non-CDS transcripts responsible for the observed increases of non-CDS

transcripts we analyzed the read mappings and found that the stress associated

increases in many cases correlated with increased thexpression of tmRNA (Figure

6B). This is interesting as tmRNA, also known as SsrA, is a conserved and abundant

type of RNA in bacteria that is important for trans-translation, a process that restores

translation in detrimental situations (Gueneau de Novoa and Williams, 2004; Keiler,

2007; Moore and Sauer, 2005). For many bacteria, lack of tmRNA results in growth

defects, sensitivity to various stress conditions, and attenuation of virulence (Brito et

al., 2016; Munavar et al., 2005; Okan et al., 2006; Svetlanov et al., 2012). The

increased expression of tmRNA (Figure 6B, Table S8) indicates that trans-translation

is part of a common adaptation to challenging environments. The induction of tmRNA

are particularly high in stationary phase, hypoxia, and upon nutritional downshift, which

indeed are conditions associated with halted growth. In line with this, smpB, encoding

a protein that interacts with tmRNA, was here identified as a USR with high PTDEX

scores in stationary phase, hypoxia, and nutritional downshift (Figure S6).

The level of tmRNA was however not high for all strains and conditions (Figure 6C),

suggesting that also other non-coding RNAs are induced upon external stress. This

was for example the case for (Figure 6B), where an analysis

of the mapped reads across its genome revealed that expression of the carbon storage

regulatory non-coding RNA CsrB correlated with expression ratios for the intergenic

regions in many the stress conditions (Figure 6D). CsrB in Klebsiella spp. has not been

extensively investigated, but has in other species been shown to be involved in

15 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

regulation of many biological processes, such as carbon metabolism, biofilm formation,

quorum sensing, and virulence (Fortune et al., 2006; Lenz et al., 2005; Liu et al., 1997).

However, even though expression of CsrB correlated with increased transcription of

non-CDSs in most of the conditions, this was not the case for stationary phase (Figure

6D). For the stationary phase we found high extent read mapping to 5’-UTR and 3’-

UTR regions of a CDS (KPN_01149) encoding a hypothetical protein (Table S2,

Figure 6D and E). The expressed region started at a position previously reported to

be a transcriptional start site (Kim et al., 2012), supporting the accuracy of this finding

(Figure 6E). The function of KPN_01149 is not known, but the presence of stress-

induced bacterial acidophilic repeat motifs supports a role in stress responses.

Similar to that of K. pneumoniae also S. aureus showed expression from non-coding

regions that not corresponded with expression of tmRNA. Here we found high level of

reads mapping to a 1230-nt intergenic region situated on the negative strand between

two CDSs encoding a hypothetical protein and an AraC family transcriptional regulator

(Figure 6F and G). Interestingly, by analysing data from a previous study (Szafranska

et al., 2014) we found that expression of this intergenic region is increased during S.

aureus infection (Figure S7B). This sequence is only found in S. aureus and two other

Staphylococcus spp., S. argenteus and S. schweitzeri. Though S. argenteus is known

to colonize humans, S. schweitzeri isolates have been recovered from fruit bats and

non-human primates, including gorilla (Becker et al., 2019). Given the broad repertoire

of different environmental conditions and the number of species, detailed analyses of

mapped reads in the different data sets provide opportunities for finding novel

intergenic non-coding RNAs or UTRs. Both K. pneumoniae and S. aureus are

examples of human pathogens responsible for causing troublesome infections, often

16 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

involving biofilm and long-term treatments, thereby contributing to increased antibiotic

resistance.

The PATHOgenex RNA Atlas Provides Opportunities for Exploring Gene

Expression in Human Pathogens

All transcription data have been collected in an interactive database designed to serve

researchers, the PATHOgenex RNA atlas (www.pathogenex.org). The data can be

browsed by selecting the strain of interest and searched with one or multiple locus

tags, protein ID, gene product, and PGFam groups. PATHOgenex provides many

opportunities for exploring regulation and function of pathways and gene products in

human pathogens.

As example, we here show differential expression of type VI secretion systems

(T6SSs) in P. aeruginosa PAO1. The genes encoding effectors and other components

of T6SS are organized in operons, and many bacteria harbor multiple T6SS operons.

Different operons are thought to be regulated differently and have different functions,

but current knowledge about their regulation is limited. Utilizing PATHOgenex, we can

now retrieve information of their regulation, supplying a basis for further studies.

PATHOgenex retrievals show that the three P. aeruginosa PAO1 T6SS operons are

regulated under different stress conditions. T6SS-1 is induced upon exposure to acidic

and oxidative stress and to some extent upon nitrosative and osmotic stress but

repressed in hypoxia and stationary phase conditions. T6SS-2 is not induced by

oxidative stress and repressed both by temperature raise and during stationary phase,

T6SS-3 on the other hand is induced during stationary growth, implying a different

function and regulation for these effectors (Figure 7A).

PATHOgenex also provides opportunities for identification of gene products regulated

in responses to certain conditions by utilizing condition-specific PTDEX scores. As an

17 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

example, when retrieving the 5 PGFam groups with highest PTDEX scores in virulence

inducing condition, the bifunctional acetaldehyde dehydrogenase/alcohol

dehydrogenase adhE appeared in the top (Figure 7B). This gene product is known to

be involved in bacterial metabolism, but has also been attributed other functions such

as stress associated translational control and virulence (Beckham et al., 2014; Kurylo

et al., 2018). PTDEX scores of adhE and others can be analyzed further by retrieving

information on which bacteria have this gene and how this gene is regulated in the

different species (Figure 7C). The degree of fold-change is important to know for data

assessment as small fold changes in highly expressed genes are expected to have a

greater impact than small changes in genes with low expression. Therefore,

PATHOgenex also offers possibilities for retrieving information of expression levels for

individual species (Figure 7D).

Taken together, PATHOgenex provides comprehensive information of gene

expression at the gene level in a particular species, but also to a broader level including

gene groups and expression in multiple species. All of this information provides new

opportunities for researchers to generate novel hypotheses and design experiments

accordingly.

Acknowledgments

We are grateful to all the expert labs for opening their lab to us and help with bacterial

culturing and stress exposure experiments. We thank Drs. Peter Lind, Teresa Frisan,

and Saskia Erttmann for critical reading of the manuscript. The work was supported by

Knut and Alice Wallenberg Foundation (No. 2016.0063), Swedish Research Council

(No. 2018-02855), and Insamlingsstiftelsen, Medical Faculty at Umeå University to M

Fallman; Novo Nordisk Foundation (K. Avican was partly supported by grant, No.

NNF17OC0026486, awarded to Dr. Emmanuelle Charpentier at MIMS, The Laboratory

18 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

for Molecular Infection Medicine Sweden); ERC (Starting grant, No. 716063) to J.

Tang; Academy of Finland Research Fellow Grant (No. 317680) to J. Aldahdooh.

Author Contributions

K.A. and M.F. conceived and supervised the project, and wrote the manuscript with

input from J.A, M.T., J.T., K.B., and M.R.. K.A. performed the experiments and

analysed the data. M.T., K.B., K.A, and M.F. designed and M.T. calculated PTDEX

scores. J.A., J.T., K.A., and M.F. designed and J.A. constructed the PATHOgenex

RNA atlas and webpage.

Declaration of Interests

Authors declare no computing interests.

METHOD DETAILS

Bacterial growth and stress exposures

All bacterial strains shown in Figure 1A were grown in their optimal growth medium

and temperatures as indicated in Table S1 in laboratories specialized in each species.

Three bacterial cultures were grown overnight and then subcultured to a new culture

vial with 1:50-1:100 dilution. The cultures were grown until exponential phase with an

OD600 of 0.1-0.6 and exposed to the 11 stresses separately. The stress conditions,

reagents used to generate them, and the exposure times are given in Table S1. Un-

exposed cultures at exponential growth were used as controls for differential

expression analysis. For nutritional downshift, bacterial cultures were spun down at

9000 rpm for 2 minutes, the supernatant removed, the pellets resuspended in 1X M9

supplemented with 0.1 M MgCl2 and 0.1 M CaCl2, and incubated for 30 minutes at

ambient temperature for each strain. For hypoxia, 2 ml screw cap tubes were filled with

19 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

bacterial culture at exponential growth and the caps were screwed on without leaving

space for air. The stress exposures were stopped by adding 5% (final concentration)

phenol:ethanol solution (Figure S1).

Total RNA isolation

One milliliter of triplicate bacterial cultures exposed to the stresses and un-exposed

control culture were immediately pelleted by centrifugation at 9000 rpm at room

temperature for 2 minutes after adding phenol:ethanol. The supernatants were

removed and pellets resuspended in 0.5 ml Trizol solution. For Gram-negative

bacteria, cells were homogenized in Trizol solution by pipetting up and down 15 times.

For Gram-positive bacteria, culture suspensions in Trizol were transferred to previously

cooled bead beater tubes containing 0.1-mm glass beads and treated with Mini-

Beadbeater (Biospec Products Inc, USA) twice at a fixed speed for 45 seconds, and

then cooled on ice for 1 minute between the treatments. Culture homogenates were

incubated at room temperature for 5 minutes and then supplemented with 0.2 ml

chloroform, thoroughly mixed by shaking 10 times, and incubated for 3 minutes. After

centrifugation at 12 000 g at 4°C for 15 minutes, the aqueous upper phase was

carefully transferred to new RNase free tubes. An equal volume of 99% ethanol was

added to the aqueous phase and isolation continued using the Direct-Zol RNA

Miniprep Plus (Zymo Research, USA) RNA purification kit protocol. Total RNAs were

eluted in RNase free water in RNase free tubes. The total RNA concentrations were

measured using the Qubit BR RNA Assay Kit (ThermoFisher Scientific, USA) and RNA

integrity confirmed on a 0.8% agarose gel in TBE buffer.

RNA-seq library preparation with RNAtag-Seq

20 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

All rRNA-depleted RNA-seq library preparations were performed according to Shishkin

et al. (2015), with minor modifications. RNAtag-Seq allows multiple library preparations

in one tube, with initial tagging of total RNA samples with modified (5’P and 3’ SpcC3)

DNA barcoded adaptors, each harboring a unique 8 nt sequence used to demultiplex

individual libraries after sequencing. We combined the library preparation of three

biological replicates from 11 stress-exposed samples and the un-exposed control

sample, resulting in 36 library preparations in one tube for each bacterial strain. To tag

the 36 replicates, 36 unique barcoded adaptors were used, with every three barcodes

used for samples from the same stress conditions in all bacterial species (Table S2).

A total of 100 ng total RNA was used for each biological replicate. The total RNA was

fragmented in 2X FastAP Thermosensitive Alkaline Phosphatase buffer for 3 minutes

at 94°C, DNase treated, and dephosphorylated with a combination of TURBO DNase

and Thermosensitive Alkaline Phosphatase in 1X FastAP buffer for 30 minutes at

37°C. Fragmented, DNase-treated, and dephosphorylated total RNAs were cleaned

with a 2X reaction volume of Agencourt RNAClean XP beads. Cleaned total RNAs

were incubated with 100 pmol of the unique DNA barcode adaptors at 70°C for 3

minutes, and then ligated with T4 RNA ligase 1 for 90 minutes at 22°C. The ligation

was stopped and the enzyme denatured by the addition of RLT buffer. The 36

denatured ligation mixes were then pooled and cleaned in a Zymo Clean &

ConcentratorTM-5 column according to the manufacturer’s 200 nt cut-off protocol. The

RNAs were eluted in 32 µl of RNase free water. Ribosomal RNA was depleted using

the Ribo-ZeroTM Magnetic Gold Kit (Bacteria) according to the manufacturer’s

instructions. The first strand cDNA of each pool was generated using an AffinityScript

Multiple Temperature cDNA synthesis kit with 50 pmol of AR2 primer at 55°C for 55

minutes. The RNA was degraded by adding a 10% reaction volume of 1 N NaOH at

21 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

70°C for 12 minutes and the reaction neutralized with an 18% reaction volume of 0.5

M acetic acid. After cleaning the reverse transcription primers with a 2X reaction

volume of RNAClean XP beads, the 3Tr3 adapter was ligated with T4 RNA ligase 1 at

22°C with overnight incubation. The second ligation was cleaned first with a 2X, and

secondly 1.5X, reaction volume of RNAClean XP beads. The cDNA was then used as

the template for PCR reaction with FailSafeTM PCR enzyme mix using 12.5 pmol

2P_univP5 as forward primers and 12.5 pmol ScriptSeqTM Index (barcode) PCR

primers as reverse primers. The PCR cycles were as follows: 95°C for 3 minutes,

followed by 12 cycles at 95°C for 30 seconds, 55°C for 30 seconds, and 68°C for 3

minutes, and then finishing at 68°C for 7 minutes. The PCR product was cleaned first

with a 1.5X, and secondly 0.8X, reaction volume of AMPure beads and eluted in

RNAse free water. The library concentrations were measured using the QubitTM dsDNA

HS Assay Kit and the library insert size determined by the Agilent DNA 1000 Kit in a

2100 Electrophoresis Bioanalyser Instrument (Agilent, USA).

QUANTIFICATION AND STATISTICAL ANALYSIS

RNA-seq data analysis

The RNA-seq libraries generated by RNAtag-Seq were sequenced by either single end

or paired end Illumina sequencing at SciLifeLab, Stockholm. Each library harboring a

pool of 36 RNA samples was demultiplexed according to the unique 8 nt on the ligated

barcode adaptors to separate reads from each replicate. The sequencing reads

generated in this study were deposited in GEO with accession number GSEXXXXXX.

Demultiplexed reads were then mapped to the genome of the species to which the

RNAs belong in an annotation-independent manner. The accession numbers of the

RefSeq reference genomes used for each strain can be found in “Microbes and

Viruses” in Key Resouces. The number of reads mapped to CDSs, rRNA, and tRNA

22 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

was calculated according to the annotation files. The number of reads mapped to the

non-coding regions were calculated by subtracting the reads mapped to the annotated

regions from the total number of mapped reads. The sequencing reads from primary

transcripts of K. pneumoniae (Kim et al., 2012) with accession number SRR408498

were downloaded from Sequence Read Archive (SRA) and mapped to the same

reference genome used in this study. In vivo and in vitro RNA-seq reads of S. aureus

strain 6850 (Szafranska et al., 2014) with accession number ERP005459 were

downloaded from SRA and mapped to RefSeq reference genome CP006706. All of

the demultiplexing and read mapping steps were performed in CLC Genomic

Workbench (Qiagen, USA).

For differential gene expression analysis, the trimmed mean of M values (TMM)

(Robinson and Oshlack, 2010) normalization method was used to normalize the

sequencing depth of the individual libraries. For each bacterial species, the

comparisons were performed between multiple groups and the control sample so that

the full dataset could be used for fitting the generalized linear model. Using the

complete datasets with multiple group comparisons allowed us to determine whether

a gene has unstable expression for which the variation is random or is differentially

expressed in response to the stress. Read values for genes with a maximum group

mean expression (the maximum average TPM value in the statistical comparison

group) <20 were removed and a threshold of 1.5-fold change with FDR-adjusted p-

value <0.05 was employed to determine differential expression.

Generation of a phylogenetic tree of 32 bacterial strains

A phylogenetic tree of the 32 strains based on NCBI was generated with

PhyloT in Newick format. The visualization of the tree was achieved using iTol.

23 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Clustering 32 bacterial strains based on expression profiles

Hierarchical clustering, with Euclidian distance, of 32 strains based on the percent of

genes expressed (TPM≥10) in all 12 conditions, percent of genes expressed in at least

one condition, and percent of genes not expressed in any of the conditions was

performed with ClustVis (Metsalu and Vilo, 2015).

Clustering 105 088 genes based on functional orthologs and isofunctional

homologs

The KEGG orthology group was assigned for clustering genes based on functional

orthologs. The amino acid sequences of all CDSs for each strain were uploaded to

GhostKOALA (Kanehisa et al., 2016) to assign the best KO group in the

genus_prokaryotes KEGG GENES database. Clustering based on isofunctional

homologs was performed with PATRIC’s Proteome Comparison Service. The amino

acid sequences of all CDSs for each strain were compared using its own annotated

genome, if available, or the closest strain to assign the best PGFam group for each

CDS.

PTDEX score calculation

The PTDEX score was calculated for the orthology/homology groups using Python

scripts and Jupyter notebooks, relying on the open-source libraries NumPy and

Pandas. The PTDEX score formula relies on the definition of differentially expressed

genes given in “RNA-Seq Analysis” above and the equation shown in Figure 2B, where

ngenes(on) is the number of differentially expressed genes in the orthology/homology

gene group, ngenes(total) is the total number of genes in the group, nstrains(on) is the

number of strains for which at least one of the orthology group genes is differentially

regulated, nstrains(total) is the total number of strains for a given stress condition, and

nstrains(database) is the total number of strains in the database (nstrains(database) =32

24 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

for general PTDEX scores, 21 for Gram negative-specific, 9 for Gram-positive-specific

PTDEX scores). The additional logarithmic component is a weight factor to give more

relative importance to orthology/homology groups with more genes.

Generation of co-expression modules on PGFam PTDEX scores and associated

KEGG pathway enrichments

The co-expression Modules Identification tool (CEMiTool) (Russo et al., 2018) was

implemented on PGFam groups of Gram-negative and -positive strains to find PGFam

groups with similar PTDEX score patterns over the 11 stress conditions. PGFam

groups associated with each module can be found in Table S5 and Table S6. Each

PGFam group associated with the particular module, if possible, was converted in KO

groups and KO groups were used in the KEGG orthology database to map pathways.

PATHOgenex website construction

PATHOgenex was built using PHP Laravel Framework 6.18.2 and PHP 7.4.4 for

server-side data processing, Javascript ECMAScript 2015 for the front end, and D3.js

5.16.0 and Plotly 1.40.0 libraries for the generation of the interactive visualizations.

Linux distribution CentOS-8 with the 64-bit kernel 4.18.0 running on four processor

cores and 64 Gb of RAM is used to host the web service on the in-house computational

cluster.

References

Avican, K., Fahlgren, A., Huss, M., Heroven, A.K., Beckstette, M., Dersch, P., and Fallman, M. (2015). Reprogramming of Yersinia from virulent to persistent mode revealed by complex in vivo RNA-seq analysis. PLoS pathogens 11, e1004600. Becker, K., Schaumburg, F., Kearns, A., Larsen, A.R., Lindsay, J.A., Skov, R.L., and Westh, H. (2019). Implications of identifying the recently defined members of the Staphylococcus aureus complex S. argenteus and S. schweitzeri: a position paper of members of the ESCMID Study Group for Staphylococci and Staphylococcal Diseases (ESGS). Clin Microbiol Infect 25, 1064-1070. Beckham, K.S.H., Connolly, J.P.R., Ritchie, J.M., Wang, D., Gawthorne, J.A., Tahoun, A., Gally, D.L., Burgess, K., Burchmore, R.J., Smith, B.O., et al. (2014). The metabolic

25 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

enzyme AdhE controls the virulence of O157:H7. Molecular microbiology 93, 199-211. Bolin, I., Forsberg, A., Norlander, L., Skurnik, M., and Wolf-Watz, H. (1988). Identification and mapping of the temperature-inducible, -encoded proteins of Yersinia spp. Infection and immunity 56, 343-348. Brito, L., Wilton, J., Ferrandiz, M.J., Gomez-Sanz, A., de la Campa, A.G., and Amblar, M. (2016). Absence of tmRNA Has a Protective Effect against Fluoroquinolones in Streptococcus pneumoniae. Front Microbiol 7, 2164. Davis, J.J., Gerdes, S., Olsen, G.J., Olson, R., Pusch, G.D., Shukla, M., Vonstein, V., Wattam, A.R., and Yoo, H. (2016). PATtyFams: Protein Families for the Microbial Genomes in the PATRIC Database. Front Microbiol 7, 118. Ermolaeva, S., Novella, S., Vega, Y., Ripio, M.T., Scortti, M., and Vazquez-Boland, J.A. (2004). Negative control of virulence genes by a diffusible autorepressor. Mol Microbiol 52, 601-611. Fang, F.C., Frawley, E.R., Tapscott, T., and Vazquez-Torres, A. (2016). Bacterial Stress Responses during Host Infection. Cell Host Microbe 20, 133-143. Fortune, D.R., Suyemoto, M., and Altier, C. (2006). Identification of CsrC and characterization of its role in epithelial cell invasion in Salmonella enterica serovar Typhimurium. Infection and immunity 74, 331-339. Gueneau de Novoa, P., and Williams, K.P. (2004). The tmRNA website: reductive evolution of tmRNA in plastids and other endosymbionts. Nucleic Acids Res 32, D104- 108. Haas, B.J., Chin, M., Nusbaum, C., Birren, B.W., and Livny, J. (2012). How deep is deep enough for RNA-Seq profiling of bacterial transcriptomes? Bmc Genomics 13. Heine, W., Beckstette, M., Heroven, A.K., Thiemann, S., Heise, U., Nuss, A.M., Pisano, F., Strowig, T., and Dersch, P. (2018). Loss of CNFY toxin-induced inflammation drives Yersinia pseudotuberculosis into persistency. PLoS Pathog 14, e1006858. Hofmann, A.F., and Hagey, L.R. (2008). Bile acids: chemistry, pathochemistry, biology, pathobiology, and therapeutics. Cell Mol Life Sci 65, 2461-2483. Ishikawa, T., Sabharwal, D., Broms, J., Milton, D.L., Sjostedt, A., Uhlin, B.E., and Wai, S.N. (2012). Pathoadaptive conditional regulation of the type VI secretion system in Vibrio cholerae O1 strains. Infection and immunity 80, 575-584. Kanehisa, M., Sato, Y., and Morishima, K. (2016). BlastKOALA and GhostKOALA: KEGG Tools for Functional Characterization of Genome and Metagenome Sequences. Journal of molecular biology 428, 726-731. Keiler, K.C. (2007). Physiology of tmRNA: what gets tagged and why? Current opinion in microbiology 10, 169-175. Kim, D., Hong, J.S., Qiu, Y., Nagarajan, H., Seo, J.H., Cho, B.K., Tsai, S.F., and Palsson, B.O. (2012). Comparative analysis of regulatory elements between Escherichia coli and Klebsiella pneumoniae by genome-wide transcription start site profiling. PLoS genetics 8, e1002867. Kroger, C., Colgan, A., Srikumar, S., Handler, K., Sivasankaran, S.K., Hammarlof, D.L., Canals, R., Grissom, J.E., Conway, T., Hokamp, K., et al. (2013). An infection- relevant transcriptomic compendium for Salmonella enterica Serovar Typhimurium. Cell host & microbe 14, 683-695. Kurylo, C.M., Parks, M.M., Juette, M.F., Zinshteyn, B., Altman, R.B., Thibado, J.K., Vincent, C.T., and Blanchard, S.C. (2018). Endogenous rRNA Sequence Variation Can Regulate Stress Response Gene Expression and Phenotype. Cell Rep 25, 236-+.

26 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Lenz, D.H., Miller, M.B., Zhu, J., Kulkarni, R.V., and Bassler, B.L. (2005). CsrA and three redundant small RNAs regulate quorum sensing in Vibrio cholerae. Molecular microbiology 58, 1186-1202. Liu, M.Y., Gui, G., Wei, B., Preston, J.F., 3rd, Oakford, L., Yuksel, U., Giedroc, D.P., and Romeo, T. (1997). The RNA molecule CsrB binds to the global regulatory protein CsrA and antagonizes its activity in Escherichia coli. The Journal of biological chemistry 272, 17502-17510. Loh, J.T., Torres, V.J., and Cover, T.L. (2007). Regulation of Helicobacter pylori cagA expression in response to salt. Cancer Res 67, 4709-4715. Lozada-Chavez, I., Janga, S.C., and Collado-Vides, J. (2006). Bacterial regulatory networks are extremely flexible in evolution. Nucleic Acids Res 34, 3434-3445. Lukacs, G.L., Rotstein, O.D., and Grinstein, S. (1991). Determinants of the phagosomal pH in macrophages. In situ assessment of vacuolar H(+)-ATPase activity, counterion conductance, and H+ "leak". The Journal of biological chemistry 266, 24540-24548. Lund, P., Tramonti, A., and De Biase, D. (2014). Coping with low pH: molecular strategies in neutralophilic bacteria. FEMS Microbiol Rev 38, 1091-1125. Malachowa, N., Kobayashi, S.D., Sturdevant, D.E., Scott, D.P., and DeLeo, F.R. (2015). Insights into the Staphylococcus aureus-host interface: global changes in host and pathogen gene expression in a rabbit skin infection model. PloS one 10, e0117713. Mandlik, A., Livny, J., Robins, W.P., Ritchie, J.M., Mekalanos, J.J., and Waldor, M.K. (2011). RNA-Seq-based monitoring of infection-linked changes in Vibrio cholerae gene expression. Cell Host Microbe 10, 165-174. Metsalu, T., and Vilo, J. (2015). ClustVis: a web tool for visualizing clustering of multivariate data using Principal Component Analysis and heatmap. Nucleic Acids Res 43, W566-570. Miravet-Verde, S., Llorens-Rico, V., and Serrano, L. (2017). Alternative transcriptional regulation in genome-reduced bacteria. Current opinion in microbiology 39, 89-95. Moore, S.D., and Sauer, R.T. (2005). Ribosome rescue: tmRNA tagging activity and capacity in Escherichia coli. Molecular microbiology 58, 456-466. Munavar, H., Zhou, Y., and Gottesman, S. (2005). Analysis of the Escherichia coli Alp phenotype: heat shock induction in ssrA mutants. Journal of bacteriology 187, 4739- 4751. Nuss, A.M., Beckstette, M., Pimenova, M., Schmuhl, C., Opitz, W., Pisano, F., Heroven, A.K., and Dersch, P. (2017). Tissue dual RNA-seq allows fast discovery of infection-specific functions and riboregulators shaping host-pathogen transcriptomes. Proc Natl Acad Sci U S A 114, E791-E800. Okan, N.A., Bliska, J.B., and Karzai, A.W. (2006). A Role for the SmpB-SsrA system in Yersinia pseudotuberculosis pathogenesis. PLoS pathogens 2, e6. Perez-Rueda, E., Janga, S.C., and Martinez-Antonio, A. (2009). Scaling relationship in the gene content of transcriptional machinery in bacteria. Mol Biosyst 5, 1494-1501. Prosseda, G., Falconi, M., Giangrossi, M., Gualerzi, C.O., Micheli, G., and Colonna, B. (2004). The virF promoter in Shigella: more than just a curved DNA stretch. Molecular microbiology 51, 523-537. Robinson, M.D., and Oshlack, A. (2010). A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol 11, R25. Rodionov, D.A. (2007). Comparative genomic reconstruction of transcriptional regulatory networks in bacteria. Chem Rev 107, 3467-3497.

27 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Russo, P.S.T., Ferreira, G.R., Cardozo, L.E., Burger, M.C., Arias-Carrasco, R., Maruyama, S.R., Hirata, T.D.C., Lima, D.S., Passos, F.M., Fukutani, K.F., et al. (2018). CEMiTool: a Bioconductor package for performing comprehensive modular co- expression analyses. BMC bioinformatics 19, 56. Shishkin, A.A., Giannoukos, G., Kucukural, A., Ciulla, D., Busby, M., Surka, C., Chen, J., Bhattacharyya, R.P., Rudy, R.F., Patel, M.M., et al. (2015). Simultaneous generation of many RNA-seq libraries in a single reaction. Nature methods 12, 323- 325. Sigurlasdottir, S., Engman, J., Eriksson, O.S., Saroj, S.D., Zguna, N., Lloris-Garcera, P., Ilag, L.L., and Jonsson, A.B. (2017). Host cell-derived lactate functions as an effector molecule in microcolony dispersal. PLoS Pathog 13, e1006251. Svetlanov, A., Puri, N., Mena, P., Koller, A., and Karzai, A.W. (2012). tmRNA system mutants are vulnerable to stress, avirulent in mice, and provide effective immune protection. Molecular microbiology 85, 122-141. Szafranska, A.K., Oxley, A.P., Chaves-Moreno, D., Horst, S.A., Rosslenbroich, S., Peters, G., Goldmann, O., Rohde, M., Sinha, B., Pieper, D.H., et al. (2014). High- resolution transcriptomic analysis of the adaptive response of Staphylococcus aureus during acute and chronic phases of osteomyelitis. mBio 5. Wagner, G.P., Kin, K., and Lynch, V.J. (2013). A model based criterion for gene expression calls using RNA-seq data. Theory Biosci 132, 159-164. Wattam, A.R., Abraham, D., Dalay, O., Disz, T.L., Driscoll, T., Gabbard, J.L., Gillespie, J.J., Gough, R., Hix, D., Kenyon, R., et al. (2014). PATRIC, the bacterial bioinformatics database and analysis resource. Nucleic Acids Res 42, D581-591. Wattam, A.R., Davis, J.J., Assaf, R., Boisvert, S., Brettin, T., Bun, C., Conrad, N., Dietrich, E.M., Disz, T., Gabbard, J.L., et al. (2017). Improvements to PATRIC, the all- bacterial Bioinformatics Database and Analysis Resource Center. Nucleic Acids Res 45, D535-D542.

28 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Table 1. USRs known to be target for antimicrobials Gene name Conditions Superclass Subsystem name alr (Caceres et al., 1997) Hyp Nd Oss Oxs Sp Vic Metabolism Amino acid racemase fabF (Young et al., 2006) Li Hyp Nd Ns Oss Oxs Sp Fatty acid synthesis Tm Vic fabG (Heath and Rock, 2004) As Li Hyp Nd Oss Oxs Sp Tm Vic fabI (Heath et al., 1998) Li Hyp Nd Ns Sp Tm folA (Toprak et al., 2011) Li Hyp Nd Ns Oss Oxs Sp Folate biosynthesis Tm Vic folC (Zhang et al., 2015) As Li Hyp Nd Oxs Sp Tm Vic folP (Fermer and Swedberg, As Li Hyp Nd Ns Oxs Sp 1997) Tm Vic panD (Zhang et al., 2013) Li Hyp Nd Oss Oxs Sp Coenzyme A biosynthesis gidB (Okamoto et al., 2007) As Li Hyp Nd Oss Oxs Sp Cellular processes Cell division initiation Tm Vic murA (De Smet et al., 1999) Hyp Nd Ns Oxs Sp Tm Vic Bacterial cell division fusA (Price and Gustafson, As Li Hyp Nd Ns Oxs Sp Protein processing Translation 2001) Tm Vic elongation tuf (Cetin et al., 1996) Li Hyp Nd Ns Oss Oxs Sp Tm Vic ileS (Antonio et al., 2002) As Li Hyp Nd Ns Oss Oxs tRNA aminoacylation Sp Tm Vic rplC (Locke et al., 2009) As Li Hyp Nd Ns Oss Oxs Ribosomal proteins Sp Tm Vic rplF (Norstrom et al., 2007) As Bs Li Hyp Nd Ns Oxs Sp Tm Vic rpsE (Gomez et al., 2017) As Li Hyp Nd Ns Oss Oxs Sp Tm Vic rpsJ (Kubanov et al., 2016) As Li Hyp Nd Ns Oss Oxs Sp Tm Vic rpsL (Traub and Nomura, 1968) As Li Hyp Nd Ns Oss Oxs Sp Tm Vic gyrA (Nakamura et al., 1989) Li Hyp Nd Ns Oxs Sp Vic DNA processing DNA topoisomerases gyrB (Schmutz et al., 2003) Li Hyp Nd Ns Oxs Vic parC (Tankovic et al., 1996) Li Hyp Nd Ns Oxs Sp Tm Vic parE (Nawaz et al., 2015) Hyp Nd Oss Oxs Sp Tm rpoB (Lisitsyn et al., 1984) Li Hyp Nd Ns Oxs Sp Tm RNA processing RNA polymerase Vic rpoC (Friedman et al., 2006) As Li Hyp Nd Ns Sp Tm Vic

29 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure Legends

Figure 1. Experimental setup for PATHOgenex RNA atlas

(A) Phylogenetic clustering of bacterial species included in the study. Phylogenetic

orders as well as Gram staining groups and oxygen dependency are indicated by color.

(B) Schematic overview of the 11 infection-related stress conditions and control (Ctrl)

sample not exposed to any stress: acidic stress (As), pH 3-5 for 10 minutes; bile stress

(Bs), 0.5% bile salts for 10 minutes; low iron (Li), 250 μM 2,2-dipyridryl for 10 minutes;

hypoxia (Hyp), low oxygen for 4 hours; nutritional downshift (Nd), incubation in 1X M9

salts for 30 minutes; nitrosative stress (Ns), 250 μM Spermine NONOate for 10

minutes; osmotic stress (Oss), 0.5% NaCl for 10 minutes; oxidative stress (Oxs), 0.5-

10 mM H2O2 for 10 minutes; stationary phase (Sp); temperature (Tm), 41°C for 20

minutes; varied virulence inducing condition (Vic). Some adjustments were made for

certain bacteria (see Table S1).

(C) Schematic illustration of the RNAtag-Seq approach allowing multiple samples (36

in this study) per library used for obtaining 1 122 transcriptomes deposited in the

PATHOgenex RNA atlas.

Figure 2. Niched pathogens have high proportion of constitutively active genes

(A) Diagrams showing; number of genes (gray), percentage of constitutively active

genes (red), percentage of genes active in certain (one to 11) conditions, and

percentage of constitutively silent genes. Genes with TPM≥10 are designated as

active, and TPM<10 as silent. Bacterial species are sorted according to the number of

CDSs in their reference genome.

(B) Scatter plot showing percentage of constitutively active genes for each species.

30 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

(C) Scatter plot showing the number genes annotated as transcriptional regulator per

CDS for each species. Blue trend lines in (B) and (C) show the degree of linear

regression with a 95% confidence interval.

(D) Hierarchical clustering of all strains based on global gene expression profiles in

(A). The 2 main clusters C1 (red) and C2 (blue) are indicated.

(E) Dot plot showing the number of CDSs in C1 (red) and C2 (blue),. The significance

between the two groups was calculated using unpaired t-test. ****P < 0.0001.

Figure 3. Cross-Microbial Human Pathogens Dynamically Respond to Changes

in the Environment

(A) Proportion of genes differentially regulated in at least one condition for each

bacteria. The linked heat maps show proportions of regulated genes for each stress

condition. The scales to the right show zero (white) to highest (green) proportion for

each species. Heat map squares marked with a cross indicate that RNA-seq was not

performed for that specific condition.

(B) Dot plot showing the percentages of genes differentially regulated in each condition

for all included species. The bar plot show the mean value for each condition.

(C) Dot plot showing percentages of differentially regulated genes in each condition for

Gram-negative (gray) and -positive bacteria (orange). The bar plots show the mean

value for each condition. The significance between the groups was calculated with

Multiple t-test using Holm-Sidak method. * indicates adjusted p-value < 0.05.

Figure 4. Transformation of differential expression to PTDEX scores enables

comprehensive understanding of gene regulation under stress conditions.

(A) Schematic illustration of clustering of 105 088 genes based on functional orthology

(KO) and isofunctional homology (PGFam) and resulting KO and PGFam gene groups.

(B) Equation used to calculate PTDEX scores for KO and PGFam gene groups.

31 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

(C) Illustration of the transformation of differential expression values of genes from

many pathogens that are clustered in one gene group into a PTDEX score for each

stress condition. Eight gene groups with highest PTDEX scores in nitrosative stress

condition are shown as example. White rectangles indicate no differential expression.

(D) The 20 KO and PGFam groups with highest PTDEX scores in low iron (top) and

oxidative stress (bottom) conditions. The size of gray dots after the gene name relates

to the number of genes in the gene groups.

Figure 5. Gram-negative and -positive bacteria exhibit distinct responses.

(A) Venn diagram of individual and shared PGFam groups with a PTDEX score >0 in

at least one of the stress conditions.

(B) Heat map showing degree of similarity between responses to different stresses in

Gram-negative and -positive bacteria. Similarity was calculated using the Pearson

correlation coefficient of PGFam groups’ PTDEX scores in each condition.

(C) Heat map showing degree of similarity between responses to different stress

conditions in Gram-negative and -positive bacteria. Similarities were calculated as in

(B) and the similarity distances shown in a dendrogram.

(D) Modules generated by CemiTool showing conditions that involve similar regulation

of gene groups with high PTDEX scores in Gram-negative and -positive bacteria. n

indicates number of PGFam groups within each module. The most enriched KEGG

pathways/processes for each module are shown with the number of gene groups

indicated in brackets. See also Table S5 and S6.

Figure 6. Complex stress conditions commonly involve differential expression

from non coding regions, including tmRNA and novel non-coding RNAs.

32 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

(A) Ratio of percent of reads mapped to non-CDSs under each stress condition in

comparison to control for all tested strains. Different stress conditions are indicated by

color. The gray line indicates Stress/Control value equal to one.

(B) Proportion of reads mapped to tmRNA sequences of total number of mapped reads

under each stress condition for all tested strains.

(C) Proportion of reads mapped to tmRNA sequences of total number of reads mapped

to non-CDSs under each stress condition for all tested strains.

(D) Proportion of reads mapped to the K. pneumoniae CsrB and to the region covering

KPN_01149 CDS with 5’-UTR and 3’-UTR sequences under each stress condition.

(E) Number and aligments of reads mapped to KPN_01149 and its UTRs in stationary

phase. Green indicates forward reads, red indicates reverse reads, and blue indicates

paired reads. The vertical dashed line indicates the position of the TSS of KPN_01149

(Kim et al., 2012).

(F) Proportion of reads mapped to the 1230-nt intergenic region in S. aureus MRSA

and MSSA genomes under each stress condition.

(G) Number of reads mapping to the 1230-nt region in S. aureus MRSA and MSSA

strains in stationary phase. Colors represent reads as noted in (E).

Different stress conditions are indicated with colored dots on the right.

The means of three replicates in each condition were used for the calculation in (A),

(B), (C), (D), and (F).

Figure 7. PATHOgenex RNA atlas provides opportunities for retrival of gene

expression data including differential expression values, PTDEX scores and

gene expression values.

(A) Differential expression of the three T6SS operons in P. aeruginosa PAO1 under 11

stress conditions.

33 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

(B) Top 5 PGFam groups with highest PTDEX scores in virulence inducing condition.

(C) Differential expression of adhE in all strains carrying the gene.

(D) Expression levels (TPM) of adhE across all conditions in three strains

representative of Gram-negative and -positive strains.

The differential expression in (A) and (C) are shown as log2 fold change with FDR-

corrected p-value < 0.05.

Figure S1. Stress exposure and experimental setup. Related to Figure 1. Three

bacterial cultures were inoculated from bacteria grown in the stationary phase for the

stress exposure experiments. Cultures, except control samples, were exposed to the

stress conditions at exponential growth (OD600 0.1-0.5). Transcription was stopped by

adding 0.05% (final concentration) phenol:ethanol solution. Cells were pelleted and

homogenized in Trizol solution and stored at -80 oC until RNA extraction step. The

libraries were generated with RNAtag-Seq method from total RNA. The specific

culturing and stress conditions for each species are indicated in Table S1. L.

pneumophila and M. tuberculosis were not exposed to the virulence inducing condition,

no such condition has been described.

Figure S2. The global expression pattern of biological replicates from the same

sample cluster. Related to Figure 2. Correlation coefficients (R) of the global

expression levels were plotted on a matrix and a dendogram showing the clustering of

replicates with complete linkage (below) for each species. R values were calculated by

the Pearson method. The matrices were generated using the “corrplot” R package and

complete linkage by the CLC Genomics Workbench.

34 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure S3. Differential expression of genes previously known to be regulated

under particular stress conditions and percentage of differentially expressed

genes in different groups of bacteria. Related to Figure 3.

(A) Differential expression of; bfd encoding bacterioferritin-associated ferredoxin, in

low iron conditions, fnr encoding fumarate and nitrate reductase under hypoxia, cstA

encoding carbon starvation protein under nutritional downshift, groL encoding heat

shock protein under high temperature conditions, dps encoding DNA protection during

starvation in the stationary phase, ahpC encoding alkyl hydroperoxide reductase C

under oxidative stress conditions, proW encoding glycine betaine/proline transport

system permease protein under osmotic stress conditions, hmp encoding

flavohemoprotein, which is involved in NO detoxification, under nitrosative stress

conditions.

(B) Aerobic (gray) and microaerophilic bacteria (orange).

(C) Bacilli (gray), (orange), and Gammaproteobacteria (red).

(D) C1 bacteria (gray) and C2 bacteria (orange) identified in Figure 2D.

Error bars indicate standard deviation of the mean. The significance between the

groups was calculated with Multiple t-test using Holm-Sidak method. ** indicates

adjusted p-value < 0.01.

Figure S4. Diverse transcription factors from Gram- negative and Gram- positive

bacteria shows disctinct PTDEX score patterns in bile stresses. Related to

Figure 5. PTDEX scores of sigma factors, response regulators, and one-component

systems in both Gram-negative and -positive strains. Transcription factors in the

different bacteria were identified using the P2TF database (Ortet et al., 2012).

35 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure S5. PTDEX score ≥0.25 is highly accurate for indicating the probability of

gene groups being regulated under a particular stress condition. Related to

Table 1. Randomly picked three PGFam groups from Gram-negative strains with

PTDEX scores between 0.25-0.3 and different numbers of genes included for each

stress condition to show how the PTDEX score indicates the differential expression of

the gene groups. Each box represents an individual gene from the gene group. Red

indicates downregulation, blue indicates upregulation, white indicates not regulated,

and crossed white indicates a gene for which expression could not be measured due

to lack of RNA.

Figure S6. PTDEX scores identify universal stress responder genes involved in

responses to multiple stresses in both Gram-negative and -positive strains.

Related to Table 1. USRs were clustered in different biological processes using the

PATRIC Subsystems tool and are shown with their general PTDEX scores over the 11

stress conditions.

Figure S7. Sequence homology of KPN_01149 paralogs in K. pneumoniae and

expression of the intergenic region in S. aureus from in vivo samples. Related

to Figure 6.

(A) Multiple sequence alignments of KPN_01149 and its paralogs in K. pneumoniae.

Sequences marked with blue indicate the 5’-UTR and 3’-UTR. Red-colored nucleotides

indicate identical nucleotides among the three sequences.

(B) Reads mapped to S. aureus 6850 intergenic region described in Figure 6G during

acute infection and chronic infection in a murine infection model, and in vitro

exponential growth (Szafranka et al., 2014).

36 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 1

A Borreliella burgdorferi Mycobacterium tuberculosis Listeria monocytogenes Staphylococcus epidermidis Staphylococcus aureus (MRSA) Staphylococcus aureus (MSSA) Enterococcus faecalis Streptococcus pyogenes Streptococcus agalactiae Streptococcus suis Streptococcus pneumoniae Campylobacter jejuni Helicobacter pylori (G27) Helicobacter pylori (J99) Achromobacter xylosoxidans Burkholderia pseudomallei Neisseria meningitidis Vibrio cholerae Francisella tularensis Legionella pneumophila Aggregatibacter actinomycet. influenzae Yersinia pseudotuberculosis Klebsiella pneumoniae Salmonella enterica Typhimurium Shigella flexneri Escherichia coli (ETEC) Escherichia coli (UPEC) Escherichia coli (EPEC)

Spirochaetales Gram (+) Corynebacteriales Bacilli Gram (-) Epsilonproteobacteria Diderm Betaproteobacteria Aerobic growth Gammaproteobacteria Microaerophilic growth

B Stress conditions Ctrl As Bs Li Hyp Nd Ns Oxs Oss Sp Tm Vic

Total vs. x3 RNA

C RNAtag-Seq

Total Barcoded RNA total RNA Pooled 36 barcoded rRNA depletion RNA samples Library preparation

Sequencing 1122 libraries bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 2

A Number Constitutively Active in Constitutively D of genes active certain conditions silent A. xylosoxidans B. burgdorferi P. aeruginosa H.pylori (J99) E. coli (ETEC) H. pylori (G27) S. enterica Typhimurium C. jejuni E. coli (EPEC) S. pyogenes E. coli (UPEC) H. influenzae K. pneumoniae

N. gonorrhoeae C1 F. tularensis A. baumannii S. suis B. pseudomallei S. pneumoniae L. monocytogenes S. agalactiae E. faecalis A. actinomyc. S. flexneri S. epidermidis V. cholerae N. meningitidis Y. pseudotuberculosis E. faecalis S. pneumoniae S. aureus (MSSA) S. aureus (MRSA) L. monocytogenes S. aureus (MSSA) S. aureus (MRSA) L. pneumophila V. cholerae M. tuberculosis A. baumannii A. actinomyc. M. tuberculosis S. suis S. flexneri S. epidermidis Y. pseudotuberculosis S. pyogenes E. coli (UPEC) S. agalactiae S. enterica Typhimurium

C2 H. influenzae E. coli (EPEC) N. meningitidis E. coli (ETEC) F. tularensis P. aeruginosa L. pneumophila K. pneumoniae C. jejuni B. pseudomallei N. gonorrhoeae A. xylosoxidans H. pylori (G27) 0 0 0 0 20 40 60 80 20 40 60 80 20 40 60 80 H. pylori (J99) 100 100 100 2000 4000 6000 B. burgdorferi Number of genes % of constitutively active genes % of genes active in certain conditions % of constitutively silent genes

8000 0.08 B M. tuberculosis C E **** 80 0.06 6000 P. aeruginosa

0.04 4000 60

0.02 2000 Constitutively regulator/CDS

40 Transcriptional active genes(%) Number of CDSs 0.00 0 2000 3000 4000 5000 6000 2000 3000 4000 5000 6000 C1 C2 Number of CDSs Number of CDSs bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 3

A As Bs Li Hyp Nd Ns Oss Oxs Sp Tm Vic A. xylosoxidans 0 33 A. baumannii 0 46 A. actinomyc. 0 46 B. burgdorferi 0 43 B. pseudomallei 0 57 C. jejuni 0 29 E. faecalis 0 50 E. coli (EPEC) 0 38 E. coli (ETEC) 0 42 E. coli (UPEC) 0 44 F. tularensis 0 54 H. influenzae 0 56 H. pylori (G27) 0 57 H. pylori (J99) 0 56 K. pneumoniae 0 36 L. pneumophila 0 64 L. monocytogenes 0 63 M. tuberculosis 0 26 N. gonorrhoeae 0 31 N. meningitidis 0 44 P. aeruginosa 0 53 S. enterica Typhimurium 0 43 S. flexneri 0 46 S. aureus (MRSA) 0 48 S. aureus (MSSA) 0 47 S. epidermidis 0 56 S. agalactiae 0 42 S. pneumoniae 0 68 S. pyogenes 0 59 S. suis 0 39 Vibrio cholerae 0 51 Y. pseudotuberculosis 0 45 0 20 40 60 80 100 Differentially expressed genes (%)

B C Gram (-) 60 60 Gram (+) ** 40 40

20 20 % of regulated genes 0 % of regulated gene s 0 Li Li As Bs Ns Sp As Bs Nd Ns Sp Nd Vic Vic Tm Tm Hyp Hyp Oss Oxs Oss Oxs bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 4

A Functional D Low iron 63831 6054 Orthologs genes KO groups (KOs) Size As Bs Li Hyp Nd Ns Oss Oxs Sp Tm Vic As Bs Li Hyp Nd Ns Oss Oxs Sp Tm Vic 105088 Size 5 genes bfd ibpA 25 Isofunctional 97565 27999 mntC clpB 50 homologues ABC.FEV.P bfd 75 genes PGfam groups (PGfams) ABC.FEV.S sitB 100 mntB pspA 150 ibpA sitC Size (n genes) ABC.FEV.A bfr B exbD ahpC 8 n on n on exbB adhE genes ( ) • species ( ) •log 1+ n on 2 ()genes ( ) clpB dnak 6 ngenes total n total •n database tolR htpG ( ) strain ( ) strain ( ) tonB ibpB psaA, scaA sufA cspA hspQ 4 psaC, scaB tonB C TC.FEV.OM2 exbD As Bs Li Hyp Nd Ns Oss Oxs Sp Tm Vic Gene groups mtsA groES 2 TC.FEV.OM feoB groES yhcN

bfr nuoA PTDEX score 0 PTDEX score calculation KO groups PGfam groups hmpA for each condition

Oxidative stress nrdG As Bs Li Hyp Nd Ns Oss Oxs Sp Tm Vic As Bs Li Hyp Nd Ns Oss Oxs Sp Tm Vic Size Size 5 lexA lexA 25 As Bs Li Hyp Nd Ns Oss Oxs Sp Tm Vic iscA ahpC 50 sodB ahpF 75 ahpC sodA 100 dinI mntB 150 Size (n genes) nrdD fdoG, fdhF nrfA iscR nrdD dinI katG 8 ABC.FEV.A ibpA yjjAraiA cspA adhE nfuA 6 Individual genes in different strains umuC PTDEX Scores rtpR sufA exbD 0 2 4 6 8 mntC nrfA nrfD 4 sdhA, frdA cspE grcA ahpF nrfB ABC.FEV.S nrfC 2 ABC.FEV.P pflB ibpAibpB sdhD, frdD iscS

ibpA iscA PTDEX score 0 KO groups PGFam groups (log2 FoldChange) 05 -5 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 5

A B Gram-negatives As Bs Li Hyp Nd Ns Oss Oxs Sp Tm Vic

As Bs Gram-positives Li Hyp Gram-positives 5981 1124 1779 Nd Ns Oss Oxs Gram-negatives Sp Tm Vic

Pearson’s r 1 0

Gram-negatives C D Gram-negatives KEGG Pathways/Proceesses KEGG Pathways/Proceesses 0 Biosynthesis of sec. metabolites (143) Biosynthesis of sec. metabolites (4) Bs 3 N-module 1 2 N-module 2 (n=768) Carbon metabolism (68) (n=61) Purine-Pyrimidine met. (4) As Ribosome (50) 2 Carbon fixation (2) Oss Biosynthesis of amino acids (42) Biosynthesis of amino acids (2) Two-component system (35) 1 Li Li Nd Oxs Cell cycle (2) 1 Hyp Sp ABC transporters (30) Ns Selenocompound met. (2) Oxs PTDEX score Oxidative phosphorylation (26) Ns 0 Quorum sensing (24) 0 Thiamine met. (1) Hyp 1 2 N-module 3 Glyoxylate and dicarboxylate met. (3) 2 N-module 4 Tryptophan met. (3) Nd (n=35) Biosynthesis of sec. metabolites (3) (n=41) Oxidative phosphorylation (2) Retinol met. (1) Glyoxylate and dicarboxylate met. (2) Sp Ns Oxs Vic 1 Pyruvate met. (1) 1 Longevity regulating pathway (2) Hyp Sp Pentose and glucuronate intercon. (1) Hyp Two-component system (2)

Tm PTDEX score Pearson’s r Pearson’s Fatty acid degradation (1) Sulfur relay system (1)

Li Tyrosine met. (1) Base excision repair (1) Bs As Ns Sp Nd 0 Vic 0 Tm Oss Oxs Hyp N-module 5 Oxidative phosphorylation (7) N-module 6 Valine, leucine and isoleucine biosyn (1) (n=30) Biosyn. of secondary metabolites (5) (n=29) Protein export (1) Oss Methane met. (4) 1 Glycerophospholipid met. (1) 1 Hyp Tm Glyoxylate and dicarboxylate met. (3) Biosynthesis of amino acids (1) Purine met. (2) Starch and sucrose met. (1) Gram-positives PTDEX score Porphyrin and chlorophyll met. (2) PTDEX score PTDEX score PTDEX score Quorum sensing (1) LPS biosynthesis (1) 2-Oxocarboxylic acid met. (1) Li 0 0 0 As Gram-positives Bs Oxs KEGG Pathways/Proceesses KEGG Pathways/Proceesses Ns 3 P-module 1 Ribosome (48) P-module 2 Biosynthesis of amino acids (8) (n=224) ABC transporters (20) (n=60) ABC transporters (6) Oss 2 Biosynthesis of amino acids (18) 2 Butanoate met. (5) Hyp Sp Nd Nd 1 Two-component system (16) Purine metabolism (4) 1 (14) 1 Phosphotransferase system (4) Hyp Tm Sp PTDEX score Purine-Pyrimidine met. (13) PTDEX score Fatty acid metabolism (3) 0 Quorum sensing (12) Glyoxylate and dicarboxylate met. (3) Tm 0 Vic P-module 3 ABC transporters (4) Pearson’s r Pearson’s (n=27) Degradation of aromatic compounds (3) Li

As Bs 2 Ns Sp Nd Vic Tm Glycine, serine and threonine met. (3) Oxs Oss Hyp Hyp Retinol met. (2) 1 Cysteine and methionine met. (2)

PTDEX score Glycolysis (2) 0 Pyrimidine met. (2) bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 6

A

A. xylosoxidansA. baumanniiA. actinomyc.B. burgdorferiB. pseudomalleiC. jejuniE. faecalisE. coliE. (EPEC) coli E.(ETEC) coli F.(UPEC) tularensisH. influenzaeH. pyloriH. pylori(G27)K. (J99)pneumoniaeL. pneumophilaL. monocytogenesM. tuberculosisN. gonorrhoeaeN. meningitidisP. aeruginosaS. entericaS. flexneri TyphimuriumS. aureusS. aureus (MRSA)S. epidermidis (MSSA)S. agalactiaeS. pneumoniaeS. pyogenesS. suisV. choleraeY. pseudotuberculosis 3.5

2.8

2.1

1.4

0.7

Stress/Control As Bs 0 B 50 Li Hyp 40 Nd 30 Ns Oss 20 Oxs

% tmRNA of total % tmRNA 10 Sp Tm 0 Vic C 80 70 Ctrl 60 50 40 30 20 10 0 % tmRNA of non-coding % tmRNA 50 50 1304465 KPN_01149 1304782 D E F G 6 SAR2467 SAR2468 5 40 40 4 3 1.4 2 30 30 1 MRSA 0 0.7 20 20 1.8 SAS2270 SAS2271 1.5 1.2 10 0 10 Transcriptional 0.9 % reads mapped % reads mapped Mapped reads (M) start site 0.6 0 0 Mapped reads(M) 0.3 MSSA 0

CsrB MRSAMSSA

KPN_01149 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.29.177147; this version posted June 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 7

A B As Bs Ns Sp Tm Li Hyp Nd Oss Oxs Vic PA0077 PA0078 PA0079 PA0080

PA0081 As Bs Li Hyp Nd Ns Oss Oxs Sp Tm Vic PA0082 PGF_09084129 adhE PA0083 PGF_06077968 ftn PA0084 PGF_00644224 manZ PA0085 T6SS-1 PGF_10382119 malK PA0086 PGF_00027674 osmY PA0087 PA0088 PTDEX score PA0089 0 2 4 6 8 PA0090 P. aureginosa P. PA0091 PA0092 PA0093 PA0094 PA0095 PA1656 PA1657 PA1658 PA1659 C As PA1660 adhE Bs Li Hyp Nd Ns Oss Oxs Sp Tm Vic T6SS-2 PA1661 A. actinomycetemcomitans PA1662 E. coli (UPEC) PA1664 E. coli (EPEC) PA1665 E. coli (ETEC) PA1666 K. pneumoniae P. aureginosa P. PA1667 S. flexneri PA1668 S. enterica Typhimurium PA1669 V. cholerae Gram-negatives PA2360 Y. pseudotuberculosis PA2361 E. faecalis PA2362 L. monocytogenes PA2363 S. epidermidis PA2364 S. aureus (MSSA) T6SS-3 PA2365 S. aureus (MRSA) PA2366 S. agalactiae PA2367 S. pneumoniae Gram-positives PA2368 S. pyogenes PA2369 S. suis P. aureginosa P. PA2370 PA2371 FoldChange (Log2) PA2373 -15 -5 -2 -1 -0.4 0.4 1 2 5 15 FoldChange (Log2)

-15 -5 -2 -1 -0.4 0.4 1 2 5 15

E. coli (EPEC) adhE K. pneumoniae adhE S. enterica Typhimurium adhE D 8000 10000 2000 8000 6000 1500 6000 4000 1000 TP M TP M TP M 4000 2000 500 2000

0 0 0 Li Li Li As Bs As Bs As Bs Ns Sp Ns Ns Sp Sp Nd Nd Nd Vic Vic Vic Tm Tm Tm Ctrl Ctrl Ctrl Hyp Oss Oxs Oss Oxs Oss Oxs Hyp Hyp

30000 L. monocytogenes adhE 5000 S. aureus (MRSA) adhE 6000 S. epidermidis adhE

4000 20000 4000 3000 TP M TP M TP M 2000 10000 2000 1000

0 0 0 Li Li Li As As Bs As Bs Bs Ns Ns Sp Ns Sp Sp Nd Nd Nd Vic Vic Vic Tm Tm Tm Ctrl Ctrl Ctrl Oss Oxs Oss Oxs Hyp Hyp Hyp Oss Oxs