Downloaded by guest on September 23, 2021 n enadO Palsson O. Bernhard and a Seif Yara in species auxotrophies Gram-negative for basis genetic and Metabolic www.pnas.org/cgi/doi/10.1073/pnas.1910499117 pathways biosynthetic precursor or biomass genetic of causative disruption dele- identified that terious suggest analyses an genomic Comparative with lesion. genomic and accompanied basis, strain-by-strain rarely a is on occurs difficult, tumor is experimentally and therapy molecular attenuated for of 5) of design and (22–24). the design targeting 21), for the (20, 4) their vaccines for 19), live of 3) (18, elucidation detec- ecosystems (13–17), the strain microbial microenvironment for for Aux- 2) and markers (10–12), (9). lifestyle as identification communities 1) and (8). endosymbiotic exploited tion niche of been as have advan- stability specific flow otrophies fitness the their energy and a as in carbon retain well the dictate and free-living auxotrophs (7), Additionally, the genome spu- over reduced host-associated patient’s tage often a the are in have pathogens acids (6), obligate amino Nutrient of (5). satisfied concentration tum likely high requirement require- a the methionine (2–4), by a patients fibrosis example, cystic For from environ- (1). among or common loss is host ment gene specific mutations to a deleterious a small-scale leading to of as formation adaptation the arise through strain’s ment auxotrophies the of Some and result ancestors. precursors prototrophic despite biomass survival having for essential resources lose extracellular on synthesize pathogenic) dependent become to and transform components. commensal capability to biomass (both the and participant strains energy each bacterial metabolic Many of into capability nutrients the metabolic the as H genomics comparative biology systems basis genetic and Gram-negatives. metabolic in the auxotrophies for identify to thus high- framework We a auxotrophies. cases develop of and predictions remaining annotation, prevent assembly, that the reconstruction sequencing, of with agree shortcomings predictions evidence, lighting 19 experimental of with out combined with Fourteen prophages) rearrangements. (including of genomic movement elements 3) genetic and indels, deletion; mobile delete- in-frame single/multigene 1) 2) (SNPs), deletions; including and polymorphisms basis, auxotro- genetic single-nucleotide of the rious basis constitute structure, complexity levels metabolic pathway Various redundancy. of 1) network the 3) with and promiscuity, that varies enzyme pangenome corre-2) and show species-dependent a We the is phy using vivo. with advan- in series fitness basis, a along tage conferring a genetic auxotrophies auxotrophs, highlight Gram-negative to and and approach, 1,300 54 it metabolic over identify apply sponding of We sequences and strains. genome auxotrophies recon- available predict metabolic of genome-scale to using struction their develop (AuxoFind) we 2019) with Here, 18, algorithm identify. June bacteria to review an difficult for of (received often 2020 are interactions 5, but February environment, approved the and MA, constrain Boston, Medicine, Auxotrophies of School University Tufts Isberg, R. Ralph by Edited and Canada; Denmark 3N6, Lyngby, K7L ON Kingston, University, Queen’s ytm ilg eerhGop eateto iegneig nvriyo aionaSnDeo A92122; CA Diego, San California of University Bioengineering, of Department Group, Research Biology Systems h dnicto fa uorp’ uretrequirements nutrient auxotroph’s an of identification The cin r ittdb h viaiiyo uret swell as nutrients of availability the by dictated are inter- actions pathogen–microbiota and interactions ost–pathogen a,1 | uaiSnlChoudhary Sonal Kumari , ahmtclmodeling mathematical suooa aeruginosa Pseudomonas a,c,2 | auxotrophy a,1 igHefner Ying , c | ooNrikFudto etrfrBoutiaiiy ehia nvriyo emr,2800 Denmark, of University Technical Biosustainability, for Center Foundation Nordisk Novo pangenome tan isolated strains | a mts Anand Amitesh , doi:10.1073/pnas.1910499117/-/DCSupplemental at online information supporting contains article This notebook. example 2 an with GitHub, on available is AuxoFind https://github.com/yseif/AuxoFind deposition: Data ulcyaalbeo h ahssesRsuc nerto etr(PATRIC).y Center Integration 1 Resource Pathosystems The on available publicly BY-NC-ND) (CC 4.0 NoDerivatives License distributed under is article access open This Submission.y Direct PNAS a is article This interest.y competing paper.y no the declare wrote authors K.S.C. The and Y.S. and per- tools; data; reagents/analytic A.A. analyzed new K.S.C. and contributed and Y.H., B.O.P. Y.S. and K.S.C., L.Y., Y.S., A.A., Y.H., research; Y.S., research; designed formed B.O.P. and Y.S. contributions: Author a into formalized be to yet However, has 34). purpose 33, 31, this (15, for network the workflow in auxotrophies a gaps and genes from essential result identify to that used anal- be balance can flux (32) 31), ysis (27, are format mathematical they a Once into (28–30). converted according mechanisms genes validated with experimentally reactions to linking up-to- most networks the metabolic contain They date 28). (27, literature published approached of tion efficiently be can but task GEMs. trivial using computationally a the not they Given sets is case such genomics. finding manually which networks, in comparative metabolic of deletions, via interconnectedness of high detection combination a escape of would result a (CEG), gene as essential sin- or conditionally a a of in means mutation metabolism by deleterious arise of gle can models requirements genome-scale example, For using (GEMs). achieved appropriate be more pathway be can auxotrophy to of and determine expected is to identification approach used mechanistic A the cutoff (25). Addi- 50% on (26). a rely with result paralogs completeness, gene methods a known as these of incorrectly tionally, number insufficient predicted that the are demonstrated of been auxotrophies has acid it amino However, rely (25). they cross-feeding that indicating on microorganisms, free-living most in exist owo orsodnemyb drse.Eal [email protected] Email: addressed. be may correspondence whom To ..adKSC otiue qal oti work. this to equally contributed K.S.C. and Y.S. h otcmrhnieefrsi ag-cl strain-specific large-scale in prediction. auxotrophy efforts of comprehensive one it corre- most making the basis, the metabolic and and genetic requirements, causal nutrient sponding identi- their approach auxotrophs, network-based fies 1,300 Our over in strains. auxotrophies Gram-negative compu- predict to a metabolism sequences presents of genomic paper models and genome-scale This using scale. approach large rarely tational have a basis on metabolic studied microbe– and strains been genetic on of underlying capabilities their constraints auxotrophic and hard The setting interactions. microbe micro- by of communities composition the bial affect heavily can host–microbe and in role interactions, important an play requirements Nutrient Significance Esaeasmldbsdo eoeantto n cura- and annotation genome on based assembled are GEMs a arneYang Laurence , l eoi eune nlzdi hssuyare study this in analyzed sequences genomic All . b . eateto hmclEngineering, Chemical of Department y raieCmosAttribution-NonCommercial- Commons Creative a,b . y y https://www.pnas.org/lookup/suppl/ , NSLts Articles Latest PNAS | f10 of 1

SYSTEMS BIOLOGY mathematical problem and developed into a fully fledged algo- str. EC20090884, serovar Ouakam, and serovar Paratyphi A rithm that can be reused across the community. Here, we develop strain A73-2) were subsequently selected out [see Experimen- a custom algorithm (AuxoFind) using comparative genomics tal Validation of Auxotrophies Highlight Technological Short- coupled with metabolic modeling to computationally predict comings at Multiple Levels for filtering criteria through Basic auxotrophies and pinpoint the corresponding genetic basis. nucleotide Local Alignment Search Tool (BLASTn)]. The final results included 11 Salmonella strains, 18 Yersinia strains, 15 Results Escherichia strains, 5 Pseudomans putida, and 5 Klebsiella strains. AuxoFind Predicts Auxotrophy from Genomic Sequences Using GEMs. The predicted auxotrophies in these 54 strains are analyzed in As a first step toward determining strain-specific auxotrophy, we detail below. collected available curated GEMs for Gram-negative bacteria from the Biochemically, Genetically and Genomically struc- The Majority of Predicted Nutrient Requirements Were Specific. tured genome-scale metabolic network reconstructions (BiGG) We classified the predicted nutrient auxotrophies into two cat- database (15, 31, 35–37). We proceeded to download and qual- egories: specific and nonspecific. Specific auxotrophies occur ity check genomic sequences from the PATRIC database (38), when the strain requires a specific nutrient to be added to including 408 Escherichia, 491 Salmonella, 91 Yersinia, 142 Pseu- minimal medium in order to grow, while a strain with a non- domonas (39), 267 Klebsiella, and 36 Shigella sequences (Fig. 1, specific auxotrophy can grow when any of a selection of nutri- Dataset S1, and SI Appendix, SI Materials and Methods). For ents is added to minimal medium. The requirement for amino each of the six GEMs, one for each genus, we predicted CEGs acids was found to be predominantly specific (Fig. 2A), while for aerobic growth on minimal medium. CEGs differ from abso- requirements for nucleotides were nonspecific (Fig. 2B). The lutely essential genes in that their absence can be compensated specificity of amino acid auxotrophy is due to the structure of for by the addition of an extracellular nutrient. In other words, the metabolic pathways, and the irreversibility of intermediate if a strain is missing a CEG, it is auxotrophic for one or more steps. In contrast, nucleotide biosynthesis can be achieved via nutrients. In contrast, a strain cannot survive without any one multiple routes (including purine and pyrimidine biosynthesis), of its essential genes, regardless of the nutritional background. as well as nucleotide salvage and interconversion. In the lat- We then homology mapped all of the modeled genes to other ter subsystem, there are multiple redundant pathways, few of strains within the same genus to identify the strains which are which are irreversible, resulting from the promiscuity of par- lacking one or more CEG, and developed a custom algorithm ticipating enzymes. Interestingly, multiple auxotrophies which (AuxoFind) which predicts nutrient requirements from a list were predicted across isolates involved nutrients known to be of present and absent metabolic genes using flux balance anal- important in host–pathogen interactions, suggesting that these ysis (SI Appendix, SI Materials and Methods) (32). AuxoFind auxotrophies may give selective advantage during host–pathogen exploits the mechanistic link between enzymatic functions and interactions. For example, specific auxotrophies for branched prototrophy encoded in GEMs, taking into account metabolic chain amino acids (BCAAs: L-isoleucine, L-leucine, and L-valine) and genetic redundancy, and using the genomic background of were shared across 11 strains, 5 of which were isolated from each strain as input. Applying AuxoFind allows the user the human samples. Intracellular levels of BCAAs play a critical flexibility to choose a growth medium and a biomass objective role in host–pathogen interactions, affecting both pathogenic- function to take into full account the strain’s metabolic environ- ity and immune activation (40, 41). Similarly, L-tryptophan (n = ment. This, in turn, allows for the analysis of auxotrophies as a 2) constitutes a resource over which the host and the pathogen result of changing nutrient sources, or biomass requirements. compete (42), niacin (n = 5) affects the pathogen’s virulence In addition, instead of returning a single solution, AuxoFind and its detection by the immune system (43), and tetrathionate can be set to output multiple alternative solutions as well as (n = 1) is a gut inflammation by-product which is known to pro- suboptimal solutions. Applying AuxoFind to the strains col- vide a respiratory electron acceptor in Salmonella (44). In total, lected from PATRIC, we predicted a total of 58 strains to be 72% (107 out of 149) of predicted auxotrophies were specific, auxotrophic for at least one nutrient, 4 of which (Salmonella with some strains having multiple specific and/or nonspecific enterica serovar Bovismorbificans str. 3114, serovar Enteritidis predicted requirements (Dataset S2). Notably, we observed a

n = 1,435 n = 130

Shigella (n = 36) Salmonella (n = 491)

S E P Y K S

s a h

s

e l

QC/QA assemblies e

c

e

l

r

i

m

b

g

h s

Yersinia (n = 91) Klebsiella (n = 267) u

i

e

s

e

d

n

No. CDSs o

i

l

r

e

o

i

l

n

i

a

a

c

l

m e

No. ‘N’ l

,

,

h

a

l

n

Escherichia (n = 408) Pseudomonas (n = 142) n

l

o

,

i

a

a

n

=

n

=

,

,

n

a

n

=

1

1 s

PATRIC database for bacterial =

= ,

n = 1,305 1

n 1

genomic sequences 9

= 1

Comparative genomics 5 3 presence/absence of genes iYL1228 Predicted iJN1411 iPAU1129 QC/QA conditionally Genome iML1515 STM.v1.2 iPC815 essential genes nutrient annotation requirements Genome-scale metabolic (CEGs) BLASTn models Flux balance analysis AuxoFind()

Fig. 1. Workflow chart: Genomes were downloaded from PATRIC (38) and quality controlled based on completeness, number of annotated coding DNA sequences, and percentage of unassigned nucleotide sequences. Manually curated GEMs were queried from BiGG (35), and used to identify CEGs in minimal medium. Each GEM was used across strains of the same genus, except for iML1515, which was used for both Escherichia and Shigella strains, iJN1411, which was only used for P. putida, and iPAU1129, which was only used for P. aeruginosa. Next, we identified the list of missing metabolic genes in each strain through comparative genomics using genomic sequences as an input. We used AuxoFind to predict auxotrophies and their genetic basis using, as input, the identified list of missing genes. Finally, when a missing gene was linked to a predicted auxotrophy, we verified its absence algorithmically using BLASTn.

2 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1910499117 Seif et al. Downloaded by guest on September 23, 2021 Downloaded by guest on September 23, 2021 o 5mtbltsfrwihannpcfi uorpywspeitd(see predicted was auxotrophy nonspecific a which across for strains metabolites 54 of 15 total Top a (B) for strain. predicted were one requirements least at in predicted 2. Fig. efe al. et Seif favorable there- mutant’s a (and the function by that of for loss evident compensated Conversely, is func- partially background. it nutritional gene’s least in case, the at defined this is when In auxotrophy is, (as dispensable. that Using CEG is fitness, Sequences disrupted tion increased Genomic an a has from with of GEMs) Auxotrophy strain number samples. Predicts a output the AuxoFind and that comparing inocula posit the by We between calculated mutants through is across reads derived measure and are fitness (TRADIS), a profiles cell sequencing bladder fitness insertion-site and transposon-directed Briefly, intes- serum, chicken (46–51). and human and model) vivo pig spleen, cattle, in mouse (including various tine, environments across vitro mutants in SL1344 and str. EC958 Typhimurium and for UTI89 profiles fitness native strains their published in evaluated strains We auxotrophic environment. of fitness Advantage the Fitness on requirements a Confer Vivo. Auxotrophies In Vitamin and Acid Amino one only containing subclades across strain. many auxotrophic spread with tree, generally phylogenetic were the auxotrophs predicted the Otherwise, subclade, single a and in together clustered leucine the not multi- While were genome trophs core (45). rapid (ParSNP) using genes alignment the core polymor- we from all count of nucleotide related, concatenation [SNP]) the closely polymorphism from (single-nucleotide were phism trees strains phylogenetic which constructed determine To strains. eri coli either of absence the 11 specific frequency. highest the with esnapestis Yersinia tan,adan and strains, tan,mlil mn cdaxtohe in auxotrophies acid amino multiple strains, .coli E. uret o hc pcfi uorpywas auxotrophy specific a which for Nutrients (A) species. Gram-negative multiple across requirements nutrient metabolic of predictions silico In lsn eurmn u oteasneof absence the to due requirement L-lysine .coli E. enx ogtt dniyteefc fmetabolic of effect the identify to sought next We ). S2 and S1 Figs. Appendix , (SI were strains K-12 auxotrophs, tan,aseicboi eurmn deto (due requirement biotin specific a strains, luierqieeti 3 in requirement L-leucine bioAB, A .ruckeri Y. or fabH, .enterica S. fabI mn cdauxotrophs, acid amino .pestis Y. cos5 across ) subsp. lsn auxo- L-lysine esnaruck- Yersinia .coli E. shrci coli, Escherichia enterica argD Escherichia .coli E. across K-12 ser. L- Salmonella, in hai eurmn a eeca nfu u fthe of out four addi- in In beneficial was phases. requirement postdispersal thiamin phases and a consecutive dispersal, tion, three to IBC, in infection fitness including of elevated stage exhibited example, one mutants For nutri- from next. fitness beneficial over Some the increased carried phases. but requirements postreversal phase and ent dispersal reversal (dispersal the the phases in in later contrast, fitness In in decreased phase). detrimental postdispersal but of and phase disruption (IBC) the nities example, For coli intestine. E. infection. across cattle varied of in disruption CEG detrimental phases upon but change fitness intestine niche-specific. Additionally, chicken is in sug- advantage eficial another, fitness in conferred example, fitness For the decreased but that condition gesting one in fitness 3; Fig. in in CEGs bold in (highlighted isolates S4). and natural S3 more Datasets in or 15 one of in out 5 level, enterica one 25 gene least Only the at (in beneficial. At detrimental in were be disruption 70 to upon while condition, fitness 1 increased than than yielded smaller We larger CEGs fitness thresholds. those in these and changes detrimental passed which fold changes mutant, log2 fold considered each log2 for measured adjusted change) 960 for fold filtered (log2 and we yields measure silico TRADIS in fitness ). S4 gene and a S3 disrupted (Datasets mutant the auxotrophy transposon out for simulating knocking both each by for in AuxoFind, conditions predicted by anaerobic was and dependence aerobic Nutrient impor- to metabolites. access tant insufficient indicates and/or fitness environment reduced unfavorable in an resulting dependence) nutrient fore fnt,tedsuto ftooto v n v u fsix of out five and five of out two of disruption the note, Of B a datgosi h nrclua atra commu- bacterial intracellular the in advantageous was and Klebsiella, fteCG hs irpinicesdfins eelost were fitness increased disruption whose CEGs the of .enterica S. argH and ihaioaisaxtohe appearing auxotrophies acids amino with Yersinia, and frdD agnn and L-arginine P epciey ile increased yielded respectively, coli, E. au mle hn00.Attlof total A 0.05. than smaller value aae S2 Dataset irpinin disruption .coli E. o h ulrsls.Nutrient results). full the for NSLts Articles Latest PNAS leuA odtos Fg 3). (Fig. conditions) ≥1 csen auxotrophic L-cysteine n u f1 in 12 of out 6 and .enterica S. irpinyielded disruption eeben- were obe to −1 bioH | f10 of 3 in S.

SYSTEMS BIOLOGY Bladder cell infection model absence of multiple CEGs distributed across different pathways E. coli strain UTI89 or by the participation of a CEG in multiple essential biosyn- IBC phase Dispersal Post-dispersal Reversal Post-reversal thetic pathways. For example, ketol-acid reductoisomerase (ilvC) is essential for the biosynthesis of both L-valine and L-isoleucine. L-arginine (argA or argG*) In the absence of ilvC alone, supplementation of both L-valine L-cysteine (cysI or cysN) thiamin (thiG or thiF*) and L-isoleucine is required (Fig. 4B). Interestingly, ilvC, ilvD, biotin (bioH*) biotin (bioA) and ilvE were lost across strains in multiple species includ- 4-aminobenzoyl- 4-aminobenzoyl- ing Klebsiella pneumoniae, S. enterica, and E. coli. There were glutamate (pabA*) glutamate (pabB) also cases in which only the simultaneous absence of two or L-isoleucine and L-valine (ilvC) more genes (e.g., encoding isozymes) was predicted to confer L-lysine (lysA) an auxotrophy. For example, K. pneumoniae strain L201 and S. L-leucine leuA* Pyridoxine ( ) enterica ser. Newport str. 0307-213 were predicted to require (pdxA) L-arginine supplementation due to the absence of both acetylor- Farm animal infection model nithine deacetylase and ornithine carbamoyltransferase isozyme S. enterica strain SL1344 (Fig. 4C). Finally, we observed instances in which the alterna- tive to supplementing with one nutrient was to supplement with Cattle intestine Pig intestine multiple nutrients. For example, a shikimate auxotrophy was pre- L- Histidine (hisG) L-cysteine cysC dicted for Klebsiella G5, Klebsiella michiganensis str. RC10, and L-histidine (hisBF) ( ) niacin nadA L-tryptophan (trpA) Escherichia fergusonii str. ATCC35469 due to the absence of ( ) BALB/c liver L-leucine (leuB) L-arginine (argBH*) 3-dehydroquinate dehydratase (aroD). If shikimate is excluded tetrathionate (frdD*, O2-) niacin (nadC) from the set of acceptable supplementations, a requirement for multiple nutrients (including L-tyrosine, L-tryptophan, and L- Fig. 3. Computationally predicted CEGs that were found to result in phenylalanine) is predicted, making the shikimate requirement increased fitness (more than one log2 fold change, P value < 0.05) pseudospecific (Fig. 4D). in mutant screens. For each condition in which the mutant fitness was tested, we list out both the nutrient for which it is predicted to be auxo- Small-Scale Mutations Constitute the Genetic Basis for Auxotro- trophic and the gene which has been disrupted. The fitness profiles were phies in P. aeruginosa and Shigella . Among the species studied, obtained from various sources in which the TRADIS workflow was applied. none of the P. aeruginosa or Shigella species in our dataset The bladder cell infection model was designed as a proxy for urinary were predicted to be auxotrophic, despite extensive reports for tract infection. Genes highlighted in bold are lost across natural iso- amino acid auxotrophy across P. aeruginosa strains isolated from lates. An asterisk (*) indicates that CEG knock-out yields a detrimental effect on fitness in other conditions. See Datasets S3 and S4 for the full cystic fibrosis patients and a predominant niacin auxotrophy dataset. in Shigella strains (3, 14). Instead, we found that CEGs were highly conserved. Niche adaptation through small-scale loss-of- function mutations has been observed in strains including P. five tested phases of infection. Other auxotrophic mutants with aeruginosa and Shigella (52–54). This result emphasizes that, in increased fitness included biotin, 4-aminobenzoyl glutamate, order to study auxotrophy development in host-adapted strains, L-isoleucine and L-valine, L-lysine, and L-leucine (49). In an future efforts should expand our workflow for the prediction intestinal infection of cattle with S. enterica, auxotrophic mutants of bacterial nutrient requirements (which is currently limited with elevated fitness were auxotrophic for nicotinate (nadA), L- to the identification of genetic lesions at the gene level) to histidine (hisBF), L-cysteine (cysC), L-arginine (argBH), L-leucine account for smaller-scale deleterious mutations. Here, we do (leuB), tetrathionate (frdD only under anaerobic conditions), and not attempt to predict pseudogenization events, as this would L-tryptophan (trpA). constitute an effort of its own. However, for proof of con- The metabolic basis for auxotrophies diverges across species as cept, we demonstrate one such analysis for the well-known case a function of their metabolic network topology and systems-level of niacin auxotrophy in Shigella, extending it to all strains in metabolic capabilities. our dataset. A subset of genes conferred both specific and nonspecific Causal loss of function mutations in nadB (including A28V, auxotrophies, depending on both the location of the missing D218N, and G74E) and in nadA (including W299X, P219L, enzymatic function in the strain-specific metabolic pathway and C128Y, C113A, C200A, C297A, and A111V) result in a niacin the species-specific local structure of the network. For exam- requirement in natural strains of E. coli, Shigella, and S. enterica ple, L-tryptophan biosynthesis can be achieved via three different (14, 43, 55). We searched our dataset for these mutations and routes in Escherichia, two in Salmonella, and only one in Yersinia found a total of 71 strains carrying at least one of the validated (Fig. 4A). Strains across all three genera have the capability SNPs and/or an indel or deletion of more than 10 amino acids to synthesize L-tryptophan from chorismate (via trpABCDE), (which are likely to result in protein structural variations and while only Salmonella and Escherichia strains are capable of which we assume to be deleterious) (56, 57). The affected species indole transport and utilization, and only Escherichia strains were Shigella flexneri, S. enterica, Shigella sonnei, E. coli, Shigella can synthesize L-tryptophan via both tnaA and trpAB pathways. dysenteriae, Shigella boydii, , Y. pestis, and As a result, loss of trpCDE confers a nonspecific auxotro- Yersinia rohdei, and the SNPs found included A111V, C128Y, phy in Escherichia and Salmonella but a specific auxotrophy in and P219L (nadA) and A28V and D218N (nadB). (Fig. 5A). Yersinia, while the loss of trpAB confers a specific auxotrophy Interestingly, subsets of deleterious mutations were restricted in Salmonella and Yersinia but is not conditionally essential in to different species. For example, C128Y and A111V muta- Escherichia. In our dataset, both E. coli str. D6 and Yersinia alek- tions were restricted to S. flexneri strains, D218N and P219L siciae str. 159 were missing the full trp operon. However, strain were restricted to S. dysenteriae strains, and A28V was restricted D6 was predicted to be a nonspecific L-tryptophan auxotroph, to E. coli strains (Dataset S5). While the A111V mutation while strain 159 was predicted to have an L-tryptophan-specific was initially described in S. enterica serovar Dublin, we only requirement. identify it here in strains of S. flexneri (55). Additionally, we We observed cases in which the simultaneous supplementa- observe 16 S. enterica serovar Enteritidis strains to carry large tion of multiple nutrients was required to support growth. The deletions in either nadA or nadB (or both in the case of 5 multiplicity of nutrient requirements was caused either by the strains). These results demonstrate that convergent evolution

4 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1910499117 Seif et al. Downloaded by guest on September 23, 2021 Downloaded by guest on September 23, 2021 .Conversely, (hisD) CEG 5B). one (Fig. only contained region chromosomal missing the CEG genes. absent 49 aging an the surrounding multiple that genes from found which varied syntenic We in missing simultaneously. fragment, of lost We number a genomic were constituted deleted ). Methods operons CEG larger and syntenic missing a a Materials of which SI part in closely cases a Appendix, multiple with compared observed (SI auxotroph we predicted level, strain each strain related of the region at genomic changes the genetic workflow. our the from assess predicted To auxotrophies the of basis genomic Auxotrophy. of Basis Genomic sup- Vivo. also In Advantage is which requirement observation capabil- niacin in an main- ported a advantage, biosynthesis were selective that nicotinate mutations a indicates confers deleterious the descendants the of across That tained loss species. to across ity led have for may requirement simultaneous the is shikimate) (e.g., requirement nutrient one to alternative the which (e.g., in nutrients (C auxotrophies multiple deletion. Pseudospecific (D) gene absent. single both of are result a as auxotrophies 4. Fig. efe al. et Seif multigene simple of cases observed 15 were the There classified deletion. simple we a auxotroph, as loss the in adjacent were ment CEGs. four of total a carry (n regions genomic hntecnevdgnsflnigtemsiggnmcfrag- genomic missing the flanking genes conserved the When pcfiiyof Specificity (A) mn cdadVtmnAxtohe ofraFitness a Confer Auxotrophies Vitamin and Acid Amino n to 1 = .enterica S. 5 Rsand ORFs 251 = L-phenylalanine, n utpesmlaeu specific simultaneous Multiple (B) structure. pathway systems-level species-specific of function a as requirement L-tryptophan 5 pnraigfae OF)aver- (ORFs) frames reading open 251 = .enterica S. BIOMASS A pyruvate L-tryptophan BC t.U8 a h nysri o which for strain only the was U288 str. Chorismate L-valine ilvE ilvC ilvH andilvI or ilvB andilvN et epoeddt xmn the examine to proceeded we Next, 159 159 159 159 159 indole BIOMASS Yersinia t.01-9 a isn two missing was 0112-791 str. tytpa,adL-tyrosine). and L-tryptophan, BIOMASS trpAB trpAB trpC trpDE trpD n Multi-species 8OF)adddnot did and ORFs) 28 = ilvC ilvH andilvI or ilvB andilvN ilvA ilvE ilvD L-threonine L-isoleucine L-tryptophan Chorismate uorpisa euto eeino utpeioye.In isozymes. multiple of deletion of result a as Auxotrophies ) SA20084644 MEM, L021 MEM, L021 MEM, L021 tnaA BL60006 D6 D6 D6 D6 BL60006 BL60006 indole BIOMASS L-Glutamate L-arginine trpAB trpC trpD trpDE Ornithine trpAB D6 BIOMASS argH argI orargF argE or astC argD argC argB argA Escherichia Klebsiella e.Nwotsri 2119wihwsioae rmacwwith cow a from isolated was which 0211-109 strain contiguous Newport ser. missing 152 for locus the three at sharing genes. Stx2 prophage a phage carried with soil) ORFs forest from Similarly, (isolated locus. S50 deletion the in (58)] (PHAST) Tool 5D). (Fig. phage teria event of lacking deletion insertion organism the recoded that mediated suggesting particular, have In auxotrophs, may four constituted DNA multiple ORFs viral in by these prophage separated Interestingly, were auxotroph. a fragment the missing in the ORFs of edges the for sequence the explain may of error assembly observation. start an this was that the suspect other marking We the p1-L201. 371, while plasmid marking file, position 4,943,680, GenBank at position chromosomal L491. located the at strain of to located end respect was the with edge ORFs contiguous deletion 246 One of total a end- ing deletion the Conversely, auxotrophs. of three downstream and in upstream points genes in the region and deleted the example, For coli 5C). (Fig. events deletion L201 L201 L201 indole L-trp lgtymr ope euneo vnsaffected events of sequence complex more slightly A at located were which genes the instances, remaining the In tan H4 32,adD1Bcnitdo 1genes, 21 of consisted DH10B and C3026, DHB4, strains L-arginine Ornithine yzsuk-4 KP11 L-tryptophan Chorismate ATCC35469 .coli E. RC10, G5 RC10, G5 Λ indole 3-dehydroquinate BIOMASS cnann 5gns rdce yPAeSearch PHAge by predicted genes, 15 [containing .coli E. BIOMASS Extracellular Periplasm Cytoplasm C10(rapA MC4100 Nonspecific trpAB trpAB trpC trpD trpDE aroA andaroC aroK oraroL aroE (orydiB) aroD chorismate ,wihi genomically a is which C321.∆A, strain Multi-species bioAB shikimate Salmonella strain name auxotroph Predicted .pneumoniae K. .pneumoniae K. L-tyr L-trp L-phe and and indole L-trp D r daeti the in adjacent are fruR) are enterobac- carried ybhB, NSLts Articles Latest PNAS t.KP11, str. tanL0 smiss- is L201 strain .coli E. argI .enterica S. and | strain f10 of 5 argF E.

SYSTEMS BIOLOGY ADSmall scale mutations Prophage induction mediated deletion No. strains BW25113 ybhC ybhB bioA bioB bioF Insertion element C321.deltA ybhC Phage genome (n = 15) bioF Repeat region Deleted gene

E. coli E. n = 152 S. enterica ftsH pyrC fabH potB IAI1 Deleted CEG S. sonnei Deletion (nadA) E. coli S50 ftsH Phage genome (n = 25) potB Deletion (nadB) S. dysenteriae Insertion (nadA) S. boydii Insertion (nadB) E Combined deletion and homologous recombination n = 334 Y. enterocolitica SNP - A111V (nadA) Y. aleksiciae SNP - A28V (nadB) 0211-109 rpsA ihfB Phage gifsy 2 (n = 56) pepN pyrD pyrC pabC pepT h SNP - C128Y (nadA) Y. pestis comEC SNP - D218N (nadB) 0112-791 rpsA ihfB blaC n = 18 ORFs pepT h Y. arohdei S. enterica rec2 SNP - P219L (nadA) n = 22 B Single gene deletion ER3440 mmuP h insH proA proB prfH pepD rtcB yafP yafO yafN 1404 bp

E. coli E. RR1 mmuP h insH pepD prfH rtcB yafP yafO yafN S. enterica T000240 hisG hisD hisC

S. enterica U288 hisG hisC NEB 5α prfH pepD gpt frsA crl MGE phoE crl MGE phoE proB proA h ykfl yafW S. enterica

221 bp coli E. HST04 prfH pepD prfH rtcB n = 16 ORFs ykfl yafW C Multi-gene deletion n = 21 n = 2,251 genes MC4100 rluA rapA leuD leuC leuB leuA fruR h F Divergent evolutionary trajectory DHB4 h rapA fruR h mraZ Y. kristensenii ATCC33639 yjjG ivlN ivlB h lpp Y. frederiksen FDAARGOS_417 yjjG pdxT gadC gadB lpp E. coli E. C3026 h rapA fruR h mraZ DH10B h rapA fruR mraZ K. pneumoniae KP11 setA lysR aroD ydiM aroE fadH leuD n = 246 K. michiganensis RC10 setA h leuD L491 panK birAilvC lysR ilvA ilvD ilvE argB argC dgoD insH dgoD dgoT K. G5 setA leuD L201 dgoT dgoD murB birA panK chromosome plasmid K. pneumoniae

Fig. 5. The genetic basis for nutrient auxotrophy spans various levels of complexity. (A) Niacin auxotrophy due to known loss-of-function mutations in nadA and nadB as well as large in-frame deletions/insertions (>30 amino acids). (B) Single gene deletion of hisD in S. enterica strain T000240. (C) Simple multigene deletion with rejoining of deletion edges. (D) Phage insertion and phage-mediated multigene deletion. (E) Multigene deletion coupled with homologous recombination mediated by prophages and insertion sequences. (F) Divergent and ancient evolutionary trajectory across species.

gastroenteritis (Dataset S6 and Fig. 5E). At the locus of deletion eriksenii carries pdxT and gadCB between yjjG and lpp, while (consisting of 252 genes), we found an insertion sequence clus- Yersinia kristensenii (which shares the largest number of gene ter likely conferring beta-lactam resistance (18 ORFs, containing families) carries a hypothetical protein ilvNB (involved in L- two copies of Class C beta-lactamase, three copies of small mul- isoleucine biosynthesis). The absence of CEGs in these cases are tidrug efflux transporters, and three copies of mobile element likely a result of evolutionary events occurring after speciation proteins). The auxotroph also carried a DNA internalization- (Fig. 5F), and a larger number of pertinent genomic sequences related competence protein ComEC/Rec2 (involved in binding would be necessary to retrace the evolutionary history of these and uptake of transforming DNA) directly upstream of the chromosomal regions. insertion sequence cluster. Notably, the deletion region (span- Genome streamlining is often associated with niche adapta- ning genes between pepN and potA) was located immediately tion and evolution toward symbiosis, and massive gene losses downstream of prophage gifsy 2 (with 56 ORFs) in a close rel- can occur on a small evolutionary timescale as a result of ative (strain 0112-791), but was relocated elsewhere in strain population bottlenecks (60). We asked whether the predicted 0211-109. We hypothesize that genomic rearrangement was auxotrophs had a reduced genomic sequence length with respect caused by the inserted cluster of genes. Similarly, 14 genes are to other strains of the same genus. For each genus, we col- deleted in E. coli strain RR1 (a derivative of K-12), including lected the strains’ sequence length and fit the observed distri- proA and proB, and 9 genes upstream of the deleted frag- bution to a generalized extreme value distribution using the ment (including 3 transposases) are redistributed across the block maxima approach. We calculated the probability of a genome. At the locus of deletion, RR1 carries multiple repeat genome length to be less than or equal to each value in our regions denoting that transposition may have occurred. In E. coli dataset, and found that a total of 41 strains fell under a prob- strain HST04, the genes flanking the deletion region (pepD and ability of 5%. Of those, six were predicted auxotrophs (includ- ykfl) are located 2,251 ORFs apart, with an insertion sequence ing K. michiganensis strain RC10, K. pneumoniae strains KP11 cluster consisting of 16 ORFs located downstream of pepD. and yzusk-4, K. G5, S. enterica str. 0112-791, and S. enterica Insertion elements can promote the rearrangement of bacterial str. 9-65), further supporting the hypothesis that these strains genomes (59). have developed auxotrophy as a result of niche adaptation. Finally, we observed four instances in which the predicted aux- However, a Fisher’s exact test reveals that there is no signifi- otrophs corresponded to species for which there was only one cant enrichment of auxotrophs among the population of strains representative genome in our dataset. For example, Yersinia fred- with reduced genomes (P value = 0.6), indicating that genome

6 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1910499117 Seif et al. Downloaded by guest on September 23, 2021 Downloaded by guest on September 23, 2021 efe al. et Seif in found found be we Additionally, can requirements. example, strains nutrient for additional of auxotrophy, have of list in they basis requirement the that genetic biotin regarding suggesting a supplementation, details for nutrient and evidence predicted literature primers, upon PCR grow couldn’t curves, strains Growth these cases. failure for analysis 6. Fig. and in 2457T, requirement niacin a predicted in the in 1) requirement included leucine with requirement auxotrophies supplemented L-proline confirmed an was The (4 media nutrient(s). 8 essential M9 tested, + experimentally we glucose that strains six 2 16 obtain of to Out able were three We auxotrophs. requirements coli predicted growth the evaluating of by experimentally Technological Levels. predictions Highlight Multiple Auxotrophies at Shortcomings of Validation Experimental six the across phenomenon predominant a genera. not is streamlining 1 ruckeri, Y. 6–4,three (61–64), Shigella eut fi-os xeietlvldtosfrntin eurmnsars 6Ga-eaiesrisadotoeo olwu xeiet and experiments follow-up of outcome and strains Gram-negative 16 across requirements nutrient for validations experimental in-house of Results .sonnei S. 6,6) n one and 69), (68, n 1 and flexneri, S. Yersinia tan21C39;ad4 an 4) and 2015C-3794; strain .coli E. Pseudogenes .coli E. eei basis Genetic Single gene operon loss Full operon Partial loss loss 6) three (65), .coli E. tan H4adD1B 3) DH10B; and DHB4 strains epoeddt aiaeour validate to proceeded We tanSF-173, strain Klebsiella Yersinia rwol when only grew sonnei) S. Bovismorbificans str. E. coliK-12strainK- E. coliK-12strainK- S. dysenteriae E. colistrainSF-173 E. colistrainHST04 S. entericaserovar S. entericaserovar S. entericaserovar Y. aleksiciaestrain Y. aleksiciaestrain E. colistrainRR1 S. sonnei2015C- tanHT4 )an 2) HST04; strain S. flexneri 2a str. S. flexneri2astr. Y. ruckeristrain Y. ruckeristrain E. colistr.K-12 K. pneumoniae K. pneumoniae .coli E. Y. ruckeriYRB YRB Y. ruckeri substr. DH10B Enteritidis str. Enteritidis str. EC20100101 SA20094177 strain KP11 strain KP11 NHV_3758 NHV_3758 12 C3026 12 DHB4 BU53M1 Salmonella Strains tan.(71). strains. 2457T 3114 3794 7)sri Fg 6). (Fig. strain (70) 159 159 n t w eoial eoe eiaie.Nt htoesri a aemultiple have may strain one that Note derivatives. recoded genomically two its and C321.∆A strain .flexneri S. strain L-valine, 6,67), (66, trpD,trpE,ilvB aB(A28V) nadB (Deletion (bp (bp (Deletion trpA,trpB,trp gene/SNPs pyrF, prsA YPO4089 YPO4089 .coli, E. argI,purA pyrI,pyrB Missing (A111V) (P219L) leuACD leuACD leuACD purDH proAB proAB = 31)) nadA nadA nadA nadA argG argG strain ilvB ilvB thrB ilvN C, , , , E. L- L- Essential nutrients Orotate C5H3N2O4 ihtecrepnigpeitdntin.Frexample, For nutrient. predicted supplemented corresponding coli medium minimal the in nei- nor with grow medium minimal could in auxotrophs ther predicted Three and (71). CP010455.1, CP010456.1) (CP006698.1, derivatives in recoded genomically requirement biotin a a oehls eie o utpeclinical multiple medium defined for chemically a derived species, nonetheless this for was effort an precedent such no is for there While medium. defined chemically reduced nutrient supple- additional have upon may growth they any requirements. that exhibited suggesting neither mentation, However, phies. aleksiciae teriae n NHV and and isoleucine, Histidine AND2',3'- Hypoxanthine AND Deoxyinosine AND Deoxyinosine Guanosine ANDL- Guanosine Adenosine ANDL- Isoleucine ANDL- Tryptophan AND L-Valine ANDL- L-Valine ANDL- L-Histidine AND Uridine ANDL- Valine ANDL- Cytosine AND riieAND Arginine Cytidine AND oal,w etdgot ftwo of growth tested we Notably, NMN ANDL- L-Threonine Cyclic CMP Tryptophan NMN AND Isoleucine Isoleucine L-Leucine L-Leucine L-Leucine L-proline L-proline Histidine Adenine Arginine Arginine tanR1i rdce -rln auxotroph, L-proline predicted a is RR1 strain Niacin Niacin Niacin Niacin Niacin tanB5M sapeitdnai uorp,and auxotroph, niacin predicted a is BU53M1 strain tan19wspeitdt aemlil auxotro- multiple have to predicted was 159 strain 78 nadto,w on ieaueeiec for evidence literature found we addition, In 3758. agnn eurmn in requirement L-arginine Observatio √ √ √ √ n × × × × × × √ √ √ √ √ √ √ √ √ √ * * * * .coli E. through blastn uorpis- auxotrophies IAppendix SI through PCR through PCR through PCR through PCR auxotrophies auxotrophies auxotrophies Gene found Gene found Gene found Gene found Gene found Follow-up Additional Additional Additional Additional Unknown results primer primer primer primer ------tanC321.∆ strain natrs * eoe that denotes (*) asterisk An . NSLts Articles Latest PNAS .ruckeri Y. .ruckeri Y. .enterocolitica Y. n t two its and A tan na on strains tan YRB strains .dysen- S. | f10 of 7 E. Y.

SYSTEMS BIOLOGY (which included L-methionine, L-glutamate, glycine, and L- detrimental in another. In addition, while the fitness benefits histidine) (72), and another for Y. pestis strains (with 12 of some auxotrophies carries over multiple stages of bladder amino acids, 3 vitamins, and citrate) (73, 74). Our strain- infection (such as L-arginine, L-cysteine, and thiamin), that of specific models predicted an auxotrophy for six amino acids: others (L-leucine and biotin) varies across stages. We hypothesize L-phenylalanine, L-methionine, L-cysteine, L-arginine, L-valine, that these variations reflect differences in nutritional availability and L-proline, with the first three carrying over from the between niches and suggest that the context-specific nutritional reference reconstruction for Y. pestis strain CO92. However, background likely plays a role in auxotrophy development. the supplementation of M9 with all six nutrients alone could We found that the metabolic basis (including speci- not support growth unless R-pantothenate was also added. ficity/nonspecificity and multiplicity) of auxotrophies depends on This result came as a surprise, since both strains seem to 1) the entire structure of the metabolic pathway, 2) the promiscu- carry an intact R-pantothenate biosynthetic pathway. Conse- ity of a protein’s enzymatic activity, and 3) functional or pathway quently, we found severe growth limitations to arise in both redundancy, and therefore varied in a strain-specific fashion. strains in the absence of L-methionine, R-pantothenate, and L- CEGs carrying out the same function in two different strains can isoleucine, with intermediate growth obtained in the absence confer a specific auxotrophy in one species but a nonspecific aux- of L-valine, L-cysteine, or L-arginine. These validated modeling otrophy in the other. Additionally, two CEGs participating in the predictions confirm that the approach suggested by D’Souza same biosynthetic pathways confer different simulated specificity et al. (8) may miss a few cases due to conservative thresholding upon deletion, depending on the position of alternative path- (SI Appendix, SI Text). ways with respect to that of the CEG. We therefore suggest that Follow-up analyses and experiments highlighted links between selective pressures for auxotrophy development leading to loss the remaining four erroneous predictions and technological of function may affect paralogs differently across strains and vary shortcomings at multiple levels, including 1) wrong sequence across CEGs participating in the same pathway as a function of annotation (which was corrected by running BLASTn directly a strain’s full reactome. on the assembly, S. enterica ser. Bovismorbificans strain We observed a continuity in the complexity of the genetic 3114), 2) localized low sequencing quality (identified by gene- basis for auxotrophy, ranging from single nucleotide polymor- specific primers, K. pneumoniae strain KP11, S. enterica strains phism causing a loss of function mutation to large multigene EC20100101 and SA20094177), 3) erroneous assemblies (ver- and multioperon deletions coupled with extensive homologous ified through manual analysis of the deletion regions in S. recombination events. Interestingly, the only case of a single enterica ser. Bovismorbificans strain 3114), 4) truncated assem- gene deletion event affected hisD, a gene which was observed bly with genes missing at the origin of sequencing (e.g., verified to have the largest number of alleles in a pangenome anal- through manual analysis of the deletion regions, S. enterica ser. ysis of E. coli strains (36). There were multiple instances in Bovismorbificans strain 3114), and 5) potential reconstruction which the loss of CEGs was likely mediated and/or accompa- knowledge gaps (experimental trial and error and intermedi- nied by prophage insertion and/or insertion sequence move- ate growth observed for Y. ruckeri strains). In particular, while ment across the genome, with one strain losing four CEGs due S. enterica strain 3114 had a high-quality assembly, genes that to the insertion of a cluster of genes conferring beta-lactam should have been located near the origin of sequencing were resistance. In particular, 6 of the 54 predicted auxotrophs had absent, and could only be found via BLASTn. As a result of these significantly smaller genomes, suggesting that they are niche observations, we subsequently added one quality control check adapted; this is indeed the case for both S. enterica serovar consisting of a search for the missing CEG in the assembly file Newport strain 0112-791 and serovar Paratyphi A strain 9-65. via BLASTn. The results are shown in Dataset S2. Overall, auxotrophies arising from large-scale deletions (one or more ORFs) are rare (3.8%) in our dataset. They could Discussion perhaps be reversed under the right conditions when their In this study, we devise an algorithm (AuxoFind) which bypasses genetic basis constitutes small variations such as SNPs (75). user-defined thresholds and pathway definitions using recon- However, major events such as full gene deletion and full structed genome-scale networks of metabolism and 1,305 quality operon removal are likely to be more permanent and highly controlled/quality assured publicly available complete genomic constrain the strain’s colonization space and bacterial social sequences to 1) computationally predict auxotrophies, 2) identify network. the corresponding metabolic basis, and 3) explore the under- Finally, we experimentally verified our predictions for nutri- lying genetic basis. We further verify 16 of our predictions ent requirements in 16 strains and observed that 11 strains were experimentally and identify the basis for inconsistencies between auxotrophs, but that minimal media could support growth of predictions and observations. 5 mutants. The latter strains served to highlight technological We predict auxotrophies for several amino acids, nucleotides, shortcomings at multiple levels. The challenges behind call- and vitamins, distinguishing specific from nonspecific nutrient ing genes/functions absent from a genomic sequence became dependencies. Surprisingly, only 38% of predicted auxotrophies apparent, and the identification of deletions/missing genes is were nonspecific. Nonspecific auxotrophs should have a more hampered, even in complete sequences, by 1) uneven sequenc- relaxed flexibility in their ability to grow across nutritional envi- ing quality across the genome, 2) incorrect genome assem- ronments with respect to specific auxotrophs while still relying bly, and 3) erroneous genome annotation. We observed that on external nutrient sources. However, such a view does not pangenome alignment (at the ORF level) of closely related take into account the strain’s phylogeny which indicates that the prototrophs can be used to overcome these technological strain’s ancestor was prototrophic, and that auxotrophy likely shortcomings and distinguish between true and false positives. developed as a result of selection pressure directed toward the Knowledge gaps in amino acid biosynthesis of Y. ruckeri, and utilization of a key nutrient in its immediate niche. Indeed, we the presence of unknown in-frame loss of function mutations predict specific auxotrophies for multiple nutrients previously affecting three strains, constituted additional sources of incon- found to be involved in host–pathogen interactions (including sistency between in silico predictions and experimental obser- BCAAs, L-tryptophan, niacin, and tetrathionate), or which seem vations. These contradictions generate testable hypotheses for to provide a fitness advantage in various niches in vivo (includ- follow-up studies (76). ing L-histidine, L-cysteine/tetrathionate, L-tryptophan, niacin, L- Altogether, our results constitute the most comprehen- glutamine, L-arginine, and L-leucine). Strikingly, we observe sive systems biology effort aimed at predicting and under- that auxotrophies that are beneficial in one environment are standing nutrient auxotrophies using mechanistic models of

8 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1910499117 Seif et al. Downloaded by guest on September 23, 2021 Downloaded by guest on September 23, 2021 7 .Til,B .Plsn rtclfrgnrtn ihqaiygenome-scale high-quality a generating for protocol A Palsson, Ø. B. Thiele, I. 27. 8 .Seif Y. 28. synthetic in exchange for Syntrophic Wang, protozoa H. Price H. N. human M. Church, M. using 26. G. Collins, of J. J. potential Mee, T. The M. therapy: 25. Symbiosis Vaccaro, and commensal E. in D. auxotrophy 24. Histidine Gilsdorf, R. J. Xie, J. Marrs, F. C. Juliao, C. P. 23. 1 .K ost,B .Sokr rmtcdpnetsloel typhimurium auxotrophic salmonella acid amino Aromatic-dependent Tumor-targeting Hoffman, M. Stocker, R. 22. A. B. Hoiseth, K. S. 21. Cabral P. M. ecologies. synthetic 20. and ecosystems Engineering Wang, H. H. Mee, T. M. 19. 8 .J Lloyd J. C. 18. 6 .Si,J .Mn,H ahd,E avs .O aso,Ssesbooyand biology Systems Palsson, O. B. Kavvas, E. Fang Machado, X. H. 17. Monk, M. J. Seif, Y. 16. aux- the of Diversity Seif Y. Denamur, E. 15. Clermont, O. Glodt, J. Bourdelier, and E. survival on Bouvet, nutrition O. and genotype 14. of Influence Botstein, D. Amini, S. Boer, M. V. 13. mark- selection auxotrophic of Testing Ronne, H. research. Johansson, M. applied Hu, Z. and G. fundamental Ulfstedt, M. in 12. strains yeast Auxotrophic Pronk, T. of J. identification 11. molecular and Isolation Chen, X. Huang, Y. Wang, Y. Wang, S. Lv, J. 10. efe al. et Seif h xeietlvldto ehd n odtosaedsrbdin described study. are this Methods. conditions of and and part Materials SI a methods Appendix, as validation tested were experimental strains The Sixteen in described Methods. is and synteny Materials and neighborhood gene of in nutrient mination of in prediction described described for is workflow is detailed auxotrophy The Methods. identification of and prediction gene Materials control, SI quality homologous and and collection data CEGs, of procedure specific The Methods and Materials genomic from requirements quickly sequences. to nutrient applied predict be systematically can and developed approach The metabolism. .M mre .K i,M .A-asm .Znlr ewrso nrei and energetic of Networks Zengler, K. Al-Bassam, M. M. Liu, K. J. Embree, M. streamlining 9. of Implications Temperton, D’Souza B. G. Thrash, 8. Cameron J. bacteria in Giovannoni, deficiency biosynthesis J. acid fibro- S. Amino Zhang, cystic 7. L. from Liu, Y. Walker, sputum H. D. of Yu, J. content X. amino-acid 6. high The Pitt, L. of T. auxotrophs Barth, methionine of L. A. Complementation Pitt, 5. L. T. Woodford, N. Barth, L. A. 4. of variants Auxotrophic Pitt, L. T. Barth, of L. Characterization A. Dwivedi, N. 3. S. Das, K. B. Kabra, K. S. Kapil, A. Agarwal, G. 2. Low, K. B. 1. (2019). aureus lococcus reconstruction. metabolic genetics. throughput irba communities. microbial therapy. molecular (2007). nontypeable disease-causing Acids Amino r o-iuetadefciea ievaccines. live as effective and (1981). non-virulent are auxotrophy. (2012). 2470–2483 8, 2018). May (21 bioRXiv:10.1101/327270 pairs. ainshv itntmtblccpblte hteal ooiaino intestinal of colonization enable that capabilities mucosa. metabolic distinct have patients Microbiol. Environ. communities. microbial U.S.A. in Sci. dynamics define interactions metabolic bacteria. in genes biosynthetic ecology. microbial for theory hosts. animal and human with associated auxotrophic of growth Microbiol. promotes patients sis aeruginosa cystic Pseudomonas with patients in respiratory in fibrosis. strains wild-type prototrophic from India. aeruginosa domonas 2001). NY, York, New (Academic, agnm of pangenome traits. metabolic serovar-specific reveal of isolates natural (2017). 891–899 in requirements otrophic yeast. starving of metabolism recombination. moss targeted the in use for ers haloarchaeon the for system manipulation Natrinema genetic a develop to mutants auxotrophic M Microbiol. BMC eoesaemtblcrcntutoso utpesloel strains salmonella multiple of reconstructions metabolic Genome-scale al., et opttoa nweg-aeeuiae h epneof response the elucidates knowledge-base computational A al., et .Ci.Microbiol. Clin. J. M yt Biol. Syst. BMC tal., et oe-rvndsg n vlto fnntiilsnhtcsyntrophic synthetic non-trivial of evolution and design Model-driven al., et 55–55 (2015). 15450–15455 112, 1–1 (1996). 110–119 45, iln asi atra mn cdboytei ahaswt high- with pathways biosynthesis acid amino bacterial in gaps Filling al., et p J7-2. sp. esi oe eetv datgscnepantepeaetls of loss prevalent the explain can advantages Selective more: is Less al., et .Benr .H ilr Eds. Miller, H. J. Brenner, S. Genetics, of Encyclopedia in Auxotroph eino ieatnae atra acnsbsdo D-glutamate on based vaccines bacterial attenuated live of Design al., et a.Commun. Nat. 0–2 (2009). 509–521 37, Salmonella shrci coli Escherichia odfeetmdatypes. media different to o.Ther. Mol. 0520 (2002). 2095–2100 68, Archaea sltdfo hoial netdcide ihcsi boi in fibrosis cystic with children infected chronically from isolated LSGenet. PLoS 3(2005). 43 5, physcomitrella rc al cd c.U.S.A. Sci. Acad. Natl. Proc. 6(2018). 66 12, rn.PatSci. Plant Front. rmcsi fibrosis. cystic from a.Protoc. Nat. 74 (1995). 37–40 33, O-antigens. 58 (2017). 15480 8, 814(2015). 483194 2015, Deter- Methods. and Materials SI Appendix, SI SEJ. ISME ampiu influenzae. Haemophilus rc al cd c.U.S.A. Sci. Acad. Natl. Proc. 3–3 (2000). 535–538 2, Evolution 2srispeaeti namtr oe disease bowel inflammatory in prevalent strains B2 1017(2018). e1007147 14, 5316 (2014). 1553–1565 8, 311(2010). 93–121 5, rvdsnwisgt notemcaim of mechanisms the into insights new provides MBio a.Commun. Nat. 5927 (2014). 2559–2570 68, 80(2017). 1850 8, net ee.Evol. Genet. Infect. 0271 (2019). e01247-19 10, ur Microbiol. Curr. suooa aeruginosa Pseudomonas LSCmu.Biol. Comput. PLoS shrci coli Escherichia suooa aeruginosa Pseudomonas 24–25 (2014). E2149–E2156 111, 71(2018). 3771 9, .Bacteriol. J. 9063 (2008). 6930–6935 105, amnlatyphimurium. Salmonella Nature 1–1 (2009). 514–517 9, 9–9 (1998). 190–195 36, . Microbiology IApni,SI Appendix, SI rc al Acad. Natl. Proc. 4994–5001 189, IAppendix, SI 238–239 291, e1006644 15, o.Biosyst. Mol. r selected are . .Med. J. Staphy- Appl. Pseu- 163, SI 2 .Ren W. 42. Tattoli I. of 41. branched-chain Controlled network Herskovits, A. A. metabolic Sigal, N. Borovok, the I. Lobel, L. of Brenner, M. Reconstruction 40. al., et Bartell A. J. 39. Wattam R. A. genetic biochemical 38. A BiGG: Palsson, Ø. B. Conrad, M. T. Park, O. J. Schellenberger, J. 37. Monk M. J. 36. predict analysis? to balance flux models Bosi is E. genome-scale What Palsson, 33. Using Ø. B. Palsson, Thiele, I. O. Orth, B. D. updated J. Monk, an 32. M. iCN718, J. Monk, O’Brien, M. J. J. E. Palsson, O. 31. B. Seif, Y. Kavvas, E. Norsigian, J. C. 30. Kavvas S. E. 29. 4 .L oa .K oasn .Mln ovretmtblcseilzto through specialization metabolic Convergent Molin, S. Johansen, K. H. Rosa, La R. 54. Hilliam Y. 53. Danzeisen, J. Nayak, R. Ricke, C. S. Johnson, J. T. Foley, L. S. 52. Vohra P. 51. Phan D. M. 50. of Mediati,“Identification G. D. 49. Grant J. A. 48. Chaudhuri R. R. 47. Chaudhuri R. core- R. rapid for 46. suite harvest The Phillippy, M. A. Koren, S. Ondov, D. B. Treangen, J. T. 45. Winter E. S. 44. Martino Di L. M. 43. King A. Z. 35. Monk M. J. 34. eke,HnigSrm ai oa,SannLnJhsn Craig Johnson, providing Lyn generously for study. Shannon Mehmet Huang this Rebecca Boyd, for Weihua Rozak, strains Dana and bacterial Drs. David Lee, Johnson, to Jun Roger Sørum, Winstanely, Sang grateful Henning Chen, are Shi Nordisk Berkmen, Strockbine, Novo We and Nancy NNF10CC1016517. GM057089, Lindsey, and Grant AI124316 Grants Foundation NIH work by This supported suggestions. constructive was providing and manuscript the reviewing ACKNOWLEDGMENTS. are numbers accession and (38), S1. PATRIC in Dataset on analyzed in available available sequences publicly genomic are All study notebook. this example an with GitHub, on in (and article lished Availability. Data nint otdfneprogram. defense host innate an host a as serve effector. to virulence isoleucine and allows signal monocytogenes in auxotrophy acids amino aeruginosa resource. reconstructions. metabolic scale (2010). 213 large 11, of knowledgebase genomic and Biotechnol. Nat. (2010). 245–248 capabilities. biological of reconstruction network metabolic baumannii of genome-scale improved indicative and states flux simulates iEK1011, conditions. physiological H37Rv, tuberculosis mycobacterium 8 (2018). 18. in paths evolutionary distinct lung. bronchiectasis fibrosis cystic (2013). 607 serovars. chicken-associated in adaptation host and of virulence niche-specific cattle. investigate to sequencing of University uropathogenic thesis, PhD infections,” tract (2018). Australia urinary Sydney, Sydney, in Technology plasticity morphological and (2016). 989–997 84, immunodeficient of infection during typhimurium (2013). animals. e1003456 food-producing 9, of colonization intestinal in genes typhimurium mice. BALB/c of (2009). infection for required genes typhimurium genomes. microbial Biol. intraspecific Genome of thousands of visualization and alignment genome Salmonella. (2013). the within ment pathogen. models. genome-scale environments. nutritional U.S.A. Sci. to Acad. adaptations strain-specific highlight strains pathogenicity. to linked U.S.A. Sci. capabilities Acad. metabolic strain-specific identifies M Genom. BMC oprtv eoesaemdligof modelling genome-scale Comparative al., et tal. et mn cdsavto nue yivsv atra ahgn triggers pathogens bacterial invasive by induced starvation acid Amino al., et uli cd Res. Acids Nucleic tal. , et iGmdl:Apafr o nertn,sadriigadsharing and standardizing integrating, for platform A models: BiGG al., et ersetv plcto ftasoo-ietdinsertion-site transposon-directed of application Retrospective al., et rn.Immunol. Front. AYE. eoesaemtblcrcntutoso multiple of reconstructions metabolic Genome-scale al., et h eu eitm fagoal ismntdmlirgresistant multidrug disseminated globally a of resistome serum The al., et oitroaevrlnefco ytei.Nt Commun. Nat. synthesis. factor virulence interrogate to M11,akoldeaeta optsEceihacl traits. coli Escherichia computes that knowledgebase a iML1515, al., et Nature ee eurdfrtefins of fitness the for required Genes al., et u namto rvdsarsiaoyeeto cetrfor acceptor electron respiratory a provides inflammation Gut al., et mn cd smdaoso eaoi rs akbtenhs and host between talk cross metabolic of mediators as acids Amino , pae n tnadzdgnm-cl eosrcinof reconstruction genome-scale standardized and Updated al., et 2 (2014). 524 15, ARC h atra iifraisdtbs n analysis and database bioinformatics bacterial the PATRIC, al., et shrci coli Escherichia suooa aeruginosa Pseudomonas opeesv dnicto of identification Comprehensive al., et rn.Genet. Front. 03–04 (2013). 20338–20343 110, (2016). E3801–E3809 113, 0–0 (2017). 904–908 35, l aagnrtdi hssuyaeicue nti pub- this in included are study this in generated data All opeesv sineto oe for roles of assignment Comprehensive al., et EE pathotype. Shigella/EIEC oeua vlto ftenctncai require- acid nicotinic the of evolution Molecular al., et 2–2 (2010). 426–429 467, 0(2019). 20 20, IAppendix SI uli cd Res. Acids Nucleic Cell etakD.KrtnZnlradMr basfor Abrams Marc and Zengler Karsten Dr. thank We M yt Biol. Syst. BMC 7–8 (2015). 971–987 161, 1 (2018). 319 9, 51D9 (2014). D581–D591 42, LSGenet. PLoS suooa aeruginosa. Pseudomonas 2 (2018). 121 9, clone. shrci coli Escherichia elHs Microbe Host Cell u.Rsi.J. Respir. Eur. .AxFn savailable is AuxoFind S1–S6). Datasets and LSGenet. PLoS 5(2018). 25 12, 55D2 (2016). D515–D522 44, 1023(2018). e1007283 14, dpainaddvricto ntenon- the in diversification and adaptation n.J e.Microbiol. Med. J. Int. 620 (2017). 1602108 49, ee eurdfrbceilsurvival bacterial for required genes 1084(2013). e1003834 9, NSLts Articles Latest PNAS gp91 irbo.Ml il Rev. Biol. Mol. Microbiol. 6–7 (2012). 563–575 11, tpyoocsaureus Staphylococcus amnlaenterica Salmonella -/- Salmonella amnlaenterica Salmonella Bo 10.1128/mBio.00269- mBio, phox LSPathog. PLoS Salmonella mice. a.Biotechnol. Nat. 43 (2017). 14631 8, yhmru in Typhimurium shrci coli Escherichia net Immun. Infect. Acinetobacter pathogenicity 651–661 303, Pseudomonas M Bioinf. BMC e1000529 5, LSGenet. PLoS Salmonella rc Natl. Proc. rc Natl. Proc. | 582– 77, f10 of 9 serovar serovar strains 28,

SYSTEMS BIOLOGY 55. U. Bergthorsson, J. R. Roth, Natural isolates of Salmonella enterica serovar Dublin 67. G. Labbe´ et al., Complete genome sequences of 17 Canadian iso- carry a single nadA missense mutation. J. Bacteriol. 187, 400–403 (2005). lates of Salmonella enterica subsp. enterica serovar Heidelberg from 56. M. Lin et al., Effects of short indels on protein structure and function in human human, animal, and food sources. Genome Announc. 4, e00990-16 genomes. Sci. Rep. 7, 9313 (2017). (2016). 57. S. P. Nuccio, A. J. Baumler,¨ Comparative analysis of Salmonella genomes identifies a 68. R. L. Lindsey et al., High-quality draft genome sequences for four drug- metabolic network for escalating growth in the inflamed gut. MBio 5, e00929–14 resistant or outbreak-associated Shigella sonnei strains generated with (2014). PacBio sequencing and whole-genome maps. Genome Announc. 5, e00906-17 58. Y. Zhou, Y. Liang, K. H. Lynch, J. J. Dennis, D. S. Wishart, PHAST: A fast phage search (2017). tool. Nucleic Acids Res. 39, W347–W352 (2011). 69. J. Kim et al., High-quality whole-genome sequences for 59 historical Shigella strains 59. K. Nyman, K. Nakamura, H. Ohtsubo, E. Ohtsubo, Distribution of the insertion generated with PacBio sequencing. Genome Announc. 6, e00282-18 (2018). sequence IS1 in Gram-negative bacteria. Nature 289, 609–612 (1981). 70. W. Huang et al., Emergence and evolution of multidrug-resistant Klebsiella pneu- 60. A. I. Nilsson et al., Bacterial genome size reduction by experimental evolution. Proc. moniae with both blaKPC and blaCTX-M integrated in the chromosome. Antimicrob. Natl. Acad. Sci. U.S.A. 102, 12112–12116 (2005). Agents Chemother. 61, e00076-17 (2017). 61. B. P. Anton, A. Fomenkov, E. A. Raleigh, M. Berkmen, Complete genome sequence of 71. T. M. Wannier et al., Adaptive evolution of genomically recoded Escherichia coli. Proc. the engineered Escherichia coli SHuffle strains and their wild-type parents. Genome Natl. Acad. Sci. U.S.A. 115, 3090–3095 (2018). Announc. 4, e00230-16 (2016). 72. N. Amirmozafari, D. C. Robertson, Nutritional requirements for synthesis of heat- 62. D. Boyd, C. Manoil, J. Beckwith, Determinants of membrane protein topology. Proc. stable enterotoxin by Yersinia enterocolitica. Appl. Environ. Microbiol. 59, 3314–3320 Natl. Acad. Sci. U.S.A. 84, 8525–8529 (1987). (1993). 63. C. Chen et al., Convergence of DNA methylation and phosphorothioation epigenetics 73. W. J. Brownlow, G. E. Wessman, Nutrition of Pasteurella pestis in chemically in bacterial genomes. Proc. Natl. Acad. Sci. U.S.A. 114, 4501–4506 (2017). defined media at temperatures of 36 to 38 C. J. Bacteriol. 79, 299–304 64. H. Jeong, Y. M. Sim, H. J. Kim, S. J. Lee, Unveiling the hybrid genome structure of (1960). Escherichia coli RR1 (HB101 RecA+). Front. Microbiol. 8, 585 (2017). 74. J. M. Fowler, R. R. Brubaker, Physiological basis of the low cal- 65. A. Wrobel, C. Ottoni, J. C. Leo, S. Gulla, D. Linke, The repeat structure of two paralo- cium response in Yersinia pestis. Infect. Immun. 62, 5234–5241 gous genes, Yersinia ruckeri invasin (yrInv) and a “Y. ruckeri invasin-like molecule”, (1994). (yrIlm) sheds light on the evolution of adhesive capacities of a fish pathogen. J. Struct. 75. B. E. Wright, M. F. Minnick, Reversion rates in a Leub auxotroph of Escherichia coli Biol. 201, 171–183 (2018). K-12 correlate with ppGpp levels during exponential growth. Microbiology 143, 847– 66. C. Bronowski et al., Genomic characterisation of invasive non-typhoidal Salmonella 854 (1997). enterica subspecies enterica serovar bovismorbificans isolates from Malawi. PLoS 76. J. Monk, J. Nogales, B. O. Palsson, Optimizing genome-scale network reconstructions. Negl. Trop. Dis. 7, e2557 (2013). Nat. Biotechnol. 32, 447–452 (2014).

10 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1910499117 Seif et al. Downloaded by guest on September 23, 2021