<<

Metabolic and genetic basis for auxotrophies in Gram-negative species

Yara Seifa,1 , Kumari Sonal Choudharya,1 , Ying Hefnera, Amitesh Ananda , Laurence Yanga,b , and Bernhard O. Palssona,c,2

aSystems Biology Research Group, Department of Bioengineering, University of California San Diego, CA 92122; bDepartment of Chemical Engineering, Queen’s University, Kingston, ON K7L 3N6, Canada; and cNovo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, 2800 Lyngby, Denmark

Edited by Ralph R. Isberg, Tufts University School of Medicine, Boston, MA, and approved February 5, 2020 (received for review June 18, 2019)

Auxotrophies constrain the interactions of with their exist in most free-living , indicating that they rely environment, but are often difficult to identify. Here, we develop on cross-feeding (25). However, it has been demonstrated that an algorithm (AuxoFind) using genome-scale metabolic recon- auxotrophies are predicted incorrectly as a result struction to predict auxotrophies and apply it to a series of the insufficient number of known gene paralogs (26). Addi- of available genome sequences of over 1,300 Gram-negative tionally, these methods rely on the identification of pathway strains. We identify 54 auxotrophs, along with the corre- completeness, with a 50% cutoff used to determine auxotrophy sponding metabolic and genetic basis, using a pangenome (25). A mechanistic approach is expected to be more appropriate approach, and highlight auxotrophies conferring a fitness advan- and can be achieved using genome-scale models of metabolism tage in vivo. We show that the metabolic basis of auxotro- (GEMs). For example, requirements can arise by means of a sin- phy is species-dependent and varies with 1) pathway structure, gle deleterious in a conditionally essential gene (CEG), 2) enzyme promiscuity, and 3) network redundancy. Various levels or as a result of a combination of deletions, in which case they of complexity constitute the genetic basis, including 1) delete- would escape detection via comparative genomics. Given the rious single-nucleotide polymorphisms (SNPs), in-frame indels, high interconnectedness of metabolic networks, finding such sets and deletions; 2) single/multigene deletion; and 3) movement of manually is not a trivial task but can be efficiently approached mobile genetic elements (including prophages) combined with computationally using GEMs. genomic rearrangements. Fourteen out of 19 predictions agree GEMs are assembled based on genome annotation and cura- with experimental evidence, with the remaining cases high- tion of published literature (27, 28). They contain the most up-to- lighting shortcomings of sequencing, assembly, annotation, and date metabolic networks linking reactions with genes according reconstruction that prevent predictions of auxotrophies. We thus to experimentally validated mechanisms (28–30). Once they are develop a framework to identify the metabolic and genetic basis converted into a mathematical format (27, 31), flux balance anal- for auxotrophies in Gram-negatives. ysis (32) can be used to identify essential genes and auxotrophies that result from gaps in the network (15, 31, 33, 34). However, systems biology | mathematical modeling | auxotrophy | pangenome | a workflow for this purpose has yet to be formalized into a comparative genomics Significance ost–pathogen interactions and pathogen–microbiota inter- Hactions are dictated by the availability of as well requirements play an important role in host–microbe as the metabolic capability of each participant to transform interactions, and can heavily affect the composition of micro- the nutrients into metabolic energy and biomass components. bial communities by setting hard constraints on microbe– Many bacterial strains (both commensal and pathogenic) lose microbe interactions. The auxotrophic capabilities of strains the capability to synthesize essential biomass precursors and and their underlying genetic and metabolic basis have rarely become dependent on extracellular resources for survival despite been studied on a large scale. This paper presents a compu- having prototrophic ancestors. Some auxotrophies arise as a tational approach using genome-scale models of metabolism result of the ’s adaptation to a specific host or environ- and genomic sequences to predict auxotrophies in over 1,300 ment through the formation of small-scale deleterious Gram-negative strains. Our network-based approach identi- leading to gene loss (1). For example, a require- fies auxotrophs, their nutrient requirements, and the corre- ment is common among strains isolated sponding causal genetic and metabolic basis, making it one of from cystic fibrosis patients (2–4), a requirement likely satisfied the most comprehensive efforts in large-scale strain-specific by the high concentration of amino acids in the patient’s spu- auxotrophy prediction. tum (5). Nutrient obligate pathogens are often host-associated (6), have a reduced genome (7), and retain a fitness advan- Author contributions: Y.S. and B.O.P. designed research; Y.S., K.S.C., Y.H., and A.A. per- tage over the free-living bacteria in their specific niche (8). formed research; Y.S., Y.H., A.A., L.Y., and B.O.P. contributed new reagents/analytic tools; Additionally, auxotrophs dictate the carbon and energy flow as Y.S. and K.S.C. analyzed data; and Y.S. and K.S.C. wrote the paper.y well as the stability of endosymbiotic communities (9). Aux- The authors declare no competing interest.y otrophies have been exploited 1) as markers for strain detec- This article is a PNAS Direct Submission.y tion and identification (10–12), 2) for the elucidation of their This open access article is distributed under Creative Commons Attribution-NonCommercial- lifestyle and microenvironment (13–17), 3) for the design of NoDerivatives License 4.0 (CC BY-NC-ND).y microbial ecosystems (18, 19), 4) for the design of attenuated Data deposition: AuxoFind is available on GitHub, with an example notebook. live vaccines (20, 21), and 5) for molecular therapy and tumor https://github.com/yseif/AuxoFind. All genomic sequences analyzed in this study are targeting (22–24). publicly available on The Pathosystems Resource Integration Center (PATRIC).y The identification of an auxotroph’s nutrient requirements 1 Y.S. and K.S.C. contributed equally to this work.y experimentally is difficult, occurs on a strain-by-strain basis, and 2 To whom correspondence may be addressed. Email: [email protected] is rarely accompanied with an identified causative genetic or This article contains supporting information online at https://www.pnas.org/lookup/suppl/ genomic lesion. Comparative genomic analyses suggest that dele- doi:10.1073/pnas.1910499117/-/DCSupplemental.y terious disruption of biomass precursor biosynthetic pathways First published March 4, 2020.

6264–6273 | PNAS | March 17, 2020 | vol. 117 | no. 11 www.pnas.org/cgi/doi/10.1073/pnas.1910499117 Downloaded by guest on September 30, 2021 Downloaded by guest on September 30, 2021 dnie ito isn ee.Fnly hnamsiggn a ikdt rdce uorpy evrfidisasneagrtmclyuigBLASTn. using algorithmically absence its verified the we input, auxotrophy, al. as et predicted using, Seif a basis to genetic linked their was and gene auxotrophies missing predict to a AuxoFind when Finally, used genes. We both input. missing for an of used as list minimal was sequences identified in which genomic CEGs iML1515, using identify for genomics to except comparative used genus, through and same for (35), the BiGG used of from only strains queried across was were used GEMs was curated GEM Manually Each sequences. medium. nucleotide unassigned of percentage and sequences, 1. Fig. (Salmonella which be of to 4 strains nutrient, 58 one of col- least total as enterica strains at a well predicted for the as we auxotrophic to solutions PATRIC, alternative AuxoFind AuxoFind from multiple lected Applying solution, output single solutions. to a suboptimal set requirements. returning be biomass of a can or as instead auxotrophies sources, addition, of nutrient analysis In changing the for of environ- allows result objective metabolic turn, the strain’s biomass in the user a This, account ment. and full the into medium allows take growth to AuxoFind function a Applying choose of to input. background flexibility as genomic the metabolic strain using account each and into redundancy, AuxoFind taking and genetic GEMs, functions (32). and in enzymatic Methods) between encoded and link prototrophy anal- Materials mechanistic balance the SI flux list exploits Appendix, using a genes (SI metabolic from algorithm ysis absent requirements custom and are a nutrient present which developed of predicts strains and which the CEG, other identify more to (AuxoFind) to or genes genus one modeled same the lacking the of background. within all more nutritional one mapped strains or the any homology one of without then regardless for survive We genes, auxotrophic cannot essential is strain its it a of CEG, words, contrast, other a In In missing nutrient. nutrients. is extracellular strain compensated an a be of if can addition absence the their by CEGs that abso- for predicted in from we differ genes CEGs genus, essential medium. each lutely minimal for on one growth aerobic GEMs, for six the of each and S1 , Dataset (38), database domonas PATRIC the qual- from 408 and including sequences download to genomic proceeded check We struc- ity 35–37). 31, Genomically (BiGG) (15, reconstructions and database network metabolic Genetically bacteria genome-scale Biochemically, tured Gram-negative for the GEMs from we curated auxotrophy, available strain-specific determining collected toward step first GEMs. a Using As Sequences Genomic from Auxotrophy Predicts AuxoFind predict Results computationally basis. genetic corresponding to genomics the pinpoint modeling comparative and auxotrophies metabolic using with (AuxoFind) coupled algorithm develop we custom Here, algo- community. fledged the a across fully reused be a can into that rithm developed and problem mathematical oko hr:Gnmswr onoddfo ARC(8 n ult otoldbsdo opeees ubro noae oigDNA coding annotated of number completeness, on based controlled quality and (38) PATRIC from downloaded were Genomes chart: Workflow eoa oimricn t.31,srvrEnteritidis serovar 3114, str. Bovismorbificans serovar 3) 267 (39), 491 Escherichia, n PU19 hc a nyue for used only was which iPAU1129, and putida, P. .For Methods). and Materials SI Appendix, SI n 36 and Klebsiella, Shigella (n=36) (n=91) Escherichia (n=408) iML1515 Genome-scale metabolic Genome-scale 91 Salmonella, iJN1411 PATRIC databaseforbacterial STM.v1.2 models iYL1228 Shigella genomic sequences iPAU1129 Salmonella (n=491) Salmonella Klebsiella (n=267) Klebsiella Pseudomonas (n=142) 142 Yersinia, eune Fg 1, (Fig. sequences iPC815 Flux balanceanalysis essential genes Pseu- conditionally ,3 n=130 n =1,435 et eietfidtels fmsigmtblcgnsi ahstrain each in genes metabolic missing of list the identified we Next, aeruginosa. P. AuxoFind() (CEGs) Comparative genomics presence/absence ihsm tan aigmlil pcfi n/rnonspecific and/or specific specific, ( were multiple auxotrophies requirements having predicted predicted strains of some 149) of with out (107 in acceptor 72% electron tetrathionate respiratory and a (43), vide system immune the (n pathogen by the detection and its (n host and niacin the which (42), over compete resource a Similarly, critical pathogenic- constitutes 41). from 2) (40, a both activation isolated affecting play immune and were interactions, BCAAs ity which host–pathogen of of in levels 5 role Intracellular strains, samples. 11 human across branched shared for were auxotrophies (BCAAs: acids specific amino chain example, For host–pathogen these interactions. during be advantage that to selective suggesting give known may interactions, auxotrophies nutrients host–pathogen involved in par- isolates which important of across auxotrophies promiscuity predicted multiple of the were Interestingly, lat- few from enzymes. pathways, the resulting ticipating redundant In irreversible, multiple interconversion. are are and which there salvage subsystem, nucleotide ter via as biosynthesis), achieved pyrimidine well and be as purine can (including biosynthesis of intermediate routes structure nucleotide of multiple the contrast, irreversibility to the In due and steps. The is pathways, 2B). auxotrophy while metabolic (Fig. acid 2A), the amino nonspecific (Fig. of specific were specificity amino predominantly nucleotides for be for nutri- requirement to non- requirements of The to found a medium. selection was added with minimal a acids to of strain be added any a to is when while ents grow nutrient grow, can to specific auxotrophy order specific a occur in requires medium auxotrophies minimal strain Specific cat- the nonspecific. Specific. two when and into Were auxotrophies specific Requirements nutrient egories: predicted Nutrient the Predicted classified We of Majority in The analyzed are strains 54 these below. detail in auxotrophies predicted The final Escherichia 11 The (BLASTn)]. included Short- Tool results Search Technological Alignment Levels Local Highlight nucleotide Multiple Auxotrophies at [see A of comings out Paratyphi selected Validation serovar subsequently tal and were Ouakam, A73-2) strain serovar EC20090884, str. QC/QA assemblies )i u namto ypoutwihi nw opro- to known is which by-product inflammation gut a is 1) = of genes No. CDSs No. ‘N’ n =1,305 tan,5 strains, requirements Predicted PNAS nutrient

Salmonella n 5 and putida, Pseudomans

1 = n , a l l e g i h S

Escherichia

| 1 = n , a i n i s r e

Y virulence pathogen’s the affects 5) = .Ntby eosre a observed we Notably, S2). Dataset

ac 7 2020 17, March 9 = n , a i h c i r e h c s E

L-isoleucine, o leigciei hog Basic through criteria filtering for 3 = n , s a n o m o d u e s P

1 = n , a l l e i s b e l K

annotation

Genome 5 1 1 = n , a l l e n o m l a QC/QA S BLASTn tan,18 strains, and Shigella Salmonella | luie and L-leucine, o.117 vol. Yersinia tan,iN41 which iJN1411, strains, tytpa (n L-tryptophan Klebsiella | 4) ntotal, In (44). o 11 no. Experimen- tan,15 strains, L-valine) strains. | 6265 =

SYSTEMS BIOLOGY A B

Fig. 2. In silico predictions of metabolic nutrient requirements across multiple Gram-negative species. (A) Nutrients for which a specific auxotrophy was predicted in at least one strain. (B) Top 15 metabolites for which a nonspecific auxotrophy was predicted (see Dataset S2 for the full results). Nutrient requirements were predicted for a total of 54 strains across , Salmonella, Klebsiella, and Yersinia, with amino acids auxotrophies appearing with the highest frequency.

specific L- requirement due to the absence of argD across fore nutrient dependence) resulting in reduced fitness indicates 11 strains, a specific biotin requirement (due to an unfavorable environment and/or insufficient access to impor- the absence of either bioAB, fabH, or fabI) across 5 Escherichia tant metabolites. Nutrient dependence was predicted in both coli strains, multiple amino acid auxotrophies in Yersinia ruck- aerobic and anaerobic conditions for each transposon mutant eri strains, and an L-leucine requirement in 3 E. coli K-12 by AuxoFind, by knocking out the disrupted gene in silico and strains. To determine which strains were closely related, we simulating for auxotrophy (Datasets S3 and S4). TRADIS yields constructed phylogenetic trees from the nucleotide polymor- a fitness measure (log2 fold change) for each mutant, which phism (single-nucleotide polymorphism [SNP]) count from the we filtered for adjusted P value smaller than 0.05. A total of concatenation of all core genes using rapid core genome multi- 960 measured log2 fold changes passed these thresholds. We alignment (ParSNP) (45). While the Y. pestis L-lysine auxo- considered log2 fold changes in fitness smaller than −1 to be trophs were not clustered together in a single subclade, the L- detrimental and those larger than 1 to be beneficial. Only 25 leucine E. coli auxotrophs, Y. ruckeri amino acid auxotrophs, CEGs yielded increased fitness upon disruption in at least one and E. coli K-12 strains were (SI Appendix, Figs. S1 and S2). condition, while 70 were detrimental (in ≥1 conditions) (Fig. 3). Otherwise, predicted auxotrophs were generally spread across At the gene level, 5 out of 15 in E. coli and 6 out of 12 in S. the phylogenetic tree, with many subclades containing only one enterica of the CEGs whose disruption increased fitness were lost auxotrophic strain. in one or more natural isolates (highlighted in bold in Fig. 3; Datasets S3 and S4). Amino Acid and Auxotrophies Confer a Fitness Advantage Of note, the disruption of two out of five and five out of six In Vivo. We next sought to identify the effect of metabolic CEGs in S. enterica and E. coli, respectively, yielded increased requirements on the fitness of auxotrophic strains in their native fitness in one condition but decreased fitness in another, sug- environment. We evaluated published fitness profiles for E. coli gesting that the conferred fitness advantage is niche-specific. strains UTI89 and EC958 and S. enterica subsp. enterica ser. For example, argH and frdD disruption in S. enterica were ben- Typhimurium str. SL1344 mutants across various in vivo and eficial in chicken intestine but detrimental in cattle intestine. in vitro environments (including cattle, pig and chicken intes- Additionally, fitness change upon CEG disruption varied across tine, mouse spleen, human serum, and bladder cell phases of infection. For example, the disruption of bioH in model) (46–51). Briefly, fitness profiles are derived through E. coli was advantageous in the intracellular bacterial commu- transposon-directed insertion-site sequencing (TRADIS), and nities (IBC) phase but detrimental in later phases (dispersal a fitness measure is calculated by comparing the number of and postdispersal phase). In contrast, leuA disruption yielded reads across mutants between the inocula and output samples. decreased fitness in the dispersal phase but increased fitness We posit that a strain with a disrupted CEG (as defined in in the reversal and postreversal phases. Some beneficial nutri- AuxoFind Predicts Auxotrophy from Genomic Sequences Using ent requirements carried over from one stage of infection to GEMs) has an increased fitness, that is, when the gene’s func- the next. For example, L-arginine and L-cysteine auxotrophic tion is dispensable. In this case, it is evident that the mutant’s mutants exhibited elevated fitness in three consecutive phases auxotrophy is at least partially compensated for by a favorable including IBC, dispersal, and postdispersal phases. In addi- nutritional background. Conversely, loss of function (and there- tion, a thiamin requirement was beneficial in four out of the

6266 | www.pnas.org/cgi/doi/10.1073/pnas.1910499117 Seif et al. Downloaded by guest on September 30, 2021 Downloaded by guest on September 30, 2021 utpiiyo uretrqieet a asdete ythe al. et by Seif either caused The growth. was support requirements to nutrient required of was multiplicity nutrients multiple of tion an have to predicted nonspecific requirement. was a 159 be strain to while predicted was D6 siciae both dataset, our In Escherichia. of loss in the while Yersinia, sarsl,ls of loss in result, phy a only As and utilization, synthesize can and transport indole only while .Srisars l he eeahv h capability the have genera three all synthesize across to Strains 4 A). exam- (Fig. For network. in routes the of missing structure ple, the and local pathway of species-specific metabolic location strain-specific the the the in both function enzymatic on depending auxotrophies, systems-level and topology capabilities. network metabolic metabolic their of function a (trpA). L-tryptophan (frdD tetrathionate (leuB), (nadA), nicotinate for (hisBF auxotrophic were fitness elevated with with cattle glutamate, of infection 4-aminobenzoyl intestinal biotin, and with mutants included L-isoleucine auxotrophic fitness Other infection. increased of phases full tested the five for S4 detrimental and iso- S3 a natural Datasets yields See across urinary knock-out conditions. CEG for lost other that proxy in dataset. are fitness indicates a bold on (*) as effect in asterisk designed highlighted An was lates. Genes model applied. infection. infection was were auxo- workflow tract profiles cell be TRADIS fitness to the bladder was The which predicted disrupted. in The fitness is been sources it mutant various has which from the which for obtained gene which nutrient the change, the in and both fold trophic condition out each list log2 we For one tested, screens. than mutant (more in fitness increased 3. Fig. biotin B hs Dispersal IBC phase Pyridoxine L-histidine eosre ae nwihtesmlaeu supplementa- simultaneous the which in cases observed We usto ee ofre ohseicadnonspecific and specific both conferred genes of subset A as species across diverges auxotrophies for basis metabolic The Salmonella L-leucine (pdxA) niacin tytpa isnhsscnb civdvatredifferent three via achieved be can biosynthesis L-tryptophan t.19wr isn h full the missing were 159 str. (bioH*) Cattle intestine Escherichia opttoal rdce Esta eefudt eutin result to found were that CEGs predicted Computationally ( nadA w in two Escherichia, L-arginine (argA orargG*) L-cysteine (cysIorcysN) (hisBF) ( leuB Salmonella glutamate (pabA*) 4-aminobenzoyl- ) ) tytpa rmcoimt (via chorismate from L-tryptophan and tytpa i both via L-tryptophan ), tetrathionate csen (cysC), L-cysteine L-valine, L-arginine (argBH*) Yersinia L-tryptophan (trpA) and L-cysteine Farm animalinfectionmodel Bladder cellinfectionmodel S. entericastrainSL1344 trpCDE and Salmonella E. colistrainUTI89 Post-dispersal biotin nyudraarbccniin) and conditions), anaerobic under only ( trpAB ( frdD* lsn,and L-lysine, u sntcniinlyesnilin essential conditionally not is but n nyoein one only and Salmonella, cysC Escherichia thiamin ( , O2-) bioA ofr oseicauxotro- nonspecific a confers ) uorpi mutants auxotrophic enterica, S. .coli E. ofr pcfi auxotrophy specific a confers ( ) agnn (argBH L-arginine thiG orthiF* u pcfi uorpyin auxotrophy specific a but trp tnaA L-isoleucine andL-valine glutamate (pabB) 4-aminobenzoyl- t.D and D6 str. tytpa auxotroph, L-tryptophan prn oee,strain However, operon. Reversal tan r aal of capable are strains luie(9.I an In (49). L-leucine L-tryptophan-specific and L-leucine L- Histidine(hisG) ) Escherichia L-lysine (lysA) BALB/c liver Pig intestine niacin trpAB P esnaalek- Yersinia ( value ( trpABCDE), Post-reversal leuA* nadC ), pathways. L-leucine ) ) Yersinia < strains ( ilvC 0.05) ) L- odfeetseis o xml,C2YadA1Vmuta- A111V to and to C128Y restricted to were restricted example, A). 5 restricted were For (Fig. were tions species. ( nadB). mutations D218N different deleterious and to C128Y, of A28V A111V, subsets and included Interestingly, (nadA) found SNPs P219L the and and rohdei, Yersinia and dysenteriae, variations species affected structural The acids 57). (56, amino were deleterious) in 10 be to result than assume validated we more to the which of of likely deletion one are and or least (which mutations at indel carrying these an strains for and/or 71 SNPs dataset of our total searched a found We niacin 55). a 43, in (14, of result strains A111V) natural and in requirement C297A, in C200A, and C113A, C128Y, G74E) and D218N, con- of proof case for dataset. in well-known our However, the auxotrophy for own. niacin analysis such of would its one to this demonstrate of we as level) cept, effort do events, gene we an pseudogenization the predict Here, constitute at to mutations. attempt lesions deleterious not limited genetic smaller-scale currently for of prediction is account identification the (which the for requirements workflow to nutrient our bacterial expand of strains, should host-adapted efforts in development future including auxotrophy strains study to in order observed been aeruginosa has loss-of- small-scale mutations auxotrophy through function adaptation niacin Niche predominant conserved. highly a and in patients fibrosis for reports cystic extensive across despite auxotrophy acid auxotrophic, amino be to predicted Auxotro- were the for of Basis none Genetic the in phies Constitute Mutations Small-Scale requirement shikimate 4D). the (Fig. pseudospecific making predicted, L-tyrosine, is for phenylalanine) requirement (including a nutrients supplementations, acceptable multiple excluded of is set the shikimate from If (aroD). dehydratase 3-dehydroquinate fergusonii Escherichia pre- for with was auxotrophy alterna- supplement dicted shikimate to the a example, was which For nutrient nutrients. in one multiple with instances supplementing observed to we tive Finally, 4C). isozyme (Fig. carbamoyltransferase acetylor- ornithine both and of deacetylase absence nithine the to due supplementation or L-arginine confer two to of predicted enterica absence example, was For isozymes) simultaneous auxotrophy. encoding the an (e.g., only genes which more in cases also ing dniyi eei tan of strains in here it in identify described initially was and and of absence biosyn- both the essential of In biosynthesis multiple the in for essential CEG (ilvC) is reductoisomerase a ketol-acid example, of For pathways. participation thetic pathways the different by across distributed or CEGs multiple of absence tan) hs eut eosrt htcnegn evolution convergent that demonstrate results These strains). either in deletions 16 observe aslls ffnto uain in mutations function of loss Causal Shigella .coli E. and enterica, S. pneumoniae, Klebsiella .Interestingly, 4B). (Fig. required is L-isoleucine ilvE hglaflexneri, Shigella .aeruginosa P. e.Nwotsr 3723wr rdce orequire to predicted were 0307-213 str. Newport ser. eels cossrisi utpeseisinclud- species multiple in strains across lost were Klebsiella and tan 3 4.Ised efudta Eswere CEGs that found we Instead, 14). (3, strains .WieteA1Vmutation A111V the While S5). (Dataset strains .enterica S. hglaboydii, Shigella .aeruginosa P. Shigella .dysenteriae S. ilvC PNAS nadA G5, .enterica, S. and t.AC349det h bec of absence the to due ATCC35469 str. eoa neiii tan ocrylarge carry to strains Enteritidis serovar ln,splmnaino both of supplementation alone, 5–4.Ti eutepaie ht in that, emphasizes result This (52–54). | lbilamichiganensis Klebsiella .flexneri S. xedn tt l tan in strains all to it extending Shigella, ac 7 2020 17, March Shigella or .enterica S. esnaenterocolitica, Yersinia or nadA .pneumoniae K. .aeruginosa P. nadB .flexneri S. tan,adA8 a restricted was A28V and strains, Shigella hglasonnei, Shigella .coli, E. . mn h pce studied, species the Among icuigW9X P219L, W299X, (including tan,D1NadP219L and D218N strains, o ohi h aeo 5 of case the in both (or eoa uln eonly we Dublin, serovar vln and L-valine | nadB pce nordataset our in species and Shigella, 5) diinly we Additionally, (55). o.117 vol. tytpa,and L-tryptophan, tan sltdfrom isolated strains hr were There coli. E. tanL0 and L201 strain icuigA28V, (including .coli, E. t.R1,and RC10, str. | and pestis, Y. L-isoleucine. o 11 no. .enterica S. lC ilvD, ilvC, L-valine Shigella | 6267 P. S. L-

SYSTEMS BIOLOGY A Yersinia Escherichia Salmonella Chorismate Chorismate Chorismate

159 trpDE D6 trpDE trpDE

159 trpD D6 trpD trpD

159 trpC D6 trpC trpC

159 trpAB D6 trpAB trpAB

indole indole indole indole indole trpAB 159 trpAB tnaA trpAB D6 L-tryptophan L-tryptophan L-trp L-tryptophan L-trp

BIOMASS BIOMASS BIOMASS BCMulti-species Klebsiella Cytoplasm Predicted L-threonine L-Glutamate argA Periplasm auxotroph Extracellular strain name ilvA pyruvate argB L201 Nonspecific 3-dehydroquinate ilvB and ilvN ilvB and ilvN argC L201 D or or RC10, G5 aroD ATCC35469 shikimate ilvH and ilvI ilvH and ilvI argD or astC MEM, L021 RC10, G5 aroE (or ydiB) ilvC ilvC SA20084644 argE L201 BL60006 aroK or aroL Ornithine Ornithine argI or argF MEM, L021 KP11 ilvD aroA and aroC ilvE BL60006 yzsuk-4 chorismate

argH L-phe L-valine ilvE MEM, L021 L-trp BL60006 L-arginine L-arginine BIOMASS L-tyr L-isoleucine BIOMASS BIOMASS BIOMASS Multi-species

Fig. 4. (A) Specificity of L-tryptophan requirement as a function of species-specific systems-level pathway structure. (B) Multiple simultaneous specific auxotrophies as a result of single gene deletion. (C) Auxotrophies as a result of deletion of multiple isozymes. In K. pneumoniae str. KP11, argI and argF are both absent. (D) Pseudospecific auxotrophies in which the alternative to one nutrient requirement (e.g., shikimate) is the simultaneous requirement for multiple nutrients (e.g., L-phenylalanine, L-tryptophan, and L-tyrosine).

may have led to loss of the nicotinate biosynthesis capabil- deletion events (Fig. 5C). For example, the deleted region in E. ity across species. That the deleterious mutations were main- coli strains DHB4, C3026, and DH10B consisted of 21 genes, tained across descendants indicates that a niacin requirement and the genes upstream and downstream of the deletion end- confers a selective advantage, an observation which is also sup- points in E. coli MC4100 (rapA and fruR) are adjacent in the ported in Amino Acid and Vitamin Auxotrophies Confer a Fitness three auxotrophs. Conversely, K. pneumoniae strain L201 is miss- Advantage In Vivo. ing a total of 246 contiguous ORFs with respect to strain L491. One deletion edge was located at position 4,943,680, marking Genomic Basis of Auxotrophy. Next, we proceeded to examine the the end of the chromosomal GenBank file, while the other was genomic basis of the auxotrophies predicted from our workflow. located at position 371, marking the start of the sequence for To assess the genetic changes at the strain level, we compared plasmid p1-L201. We suspect that an assembly error may explain the genomic region of each predicted auxotroph with a closely this observation. related strain (SI Appendix, SI Materials and Methods). We In the remaining instances, the genes which were located at observed multiple cases in which a missing CEG constituted a the edges of the missing fragment were separated by multiple part of a larger deleted genomic fragment, in which multiple ORFs in the auxotroph. Interestingly, these ORFs constituted syntenic operons were lost simultaneously. We found that the a prophage in four auxotrophs, suggesting that insertion of number of missing syntenic genes surrounding an absent CEG viral DNA may have mediated the deletion event (Fig. 5D). varied from n = 1 to n = 251 open reading frames (ORFs) aver- In particular, E. coli strain C321.∆A, which is a genomically aging 49 genes. S. enterica str. U288 was the only strain for which recoded organism lacking bioAB and ybhB, carried enterobac- the missing chromosomal region contained only one CEG (hisD) teria phage Λ [containing 15 genes, predicted by PHAge Search (Fig. 5B). Conversely, S. enterica str. 0112-791 was missing two Tool (PHAST) (58)] in the deletion locus. Similarly, E. coli strain genomic regions (n = 251 ORFs and n = 28 ORFs) and did not S50 (isolated from forest soil) carried a prophage sharing three carry a total of four CEGs. ORFs with phage Stx2 at the locus for 152 missing contiguous When the conserved genes flanking the missing genomic frag- genes. ment were adjacent in the auxotroph, we classified the observed A slightly more complex sequence of events affected S. enterica loss as a simple deletion. There were 15 cases of simple multigene ser. Newport strain 0211-109 which was isolated from a cow with

6268 | www.pnas.org/cgi/doi/10.1073/pnas.1910499117 Seif et al. Downloaded by guest on September 30, 2021 Downloaded by guest on September 30, 2021 asdb h netdcutro ee.Smlry 4gnsare genes 14 was Similarly, proA genes. rearrangement of in strain cluster genomic deleted inserted in the that elsewhere rel- by hypothesize relocated close caused a was We in but ORFs) 0211-109. 0112-791), 56 (with (strain 2 ative gifsy the prophage of (span- of region downstream upstream between deletion directly genes the Notably, ning DNA) cluster. transforming sequence insertion of binding internalization- in uptake DNA (involved and ComEC/Rec2 a element protein carried mobile competence also of related copies auxotroph three The mul- small and ). of transporters, copies three efflux beta-lactamase, tidrug C Class clus- of containing sequence copies ORFs, insertion two (18 resistance an beta-lactam found conferring likely we ter genes), 252 of (consisting S6 (Dataset gastroenteritis ( E deletion. in multigene hisD of phage-mediated deletion and (F gene insertion sequences. Single Phage insertion (B) and (D) acids). prophages edges. by amino deletion mediated (>30 recombination of deletions/insertions homologous rejoining in-frame with large as deletion well multigene as nadB and nadA 5. Fig. toh orsoddt pce o hc hr a nyone only was example, there For dataset. which our in for genome species representative to corresponded otrophs of bacterial downstream of rearrangement located (59). the genomes ORFs promote sequence can insertion 16 elements an Insertion of with apart, consisting ORFs (pepD cluster 2,251 region located deletion the are flanking ykfl) repeat genes multiple the In the carries HST04, occurred. have strain RR1 across may deletion, transposition that redistributed of denoting regions locus are the transposases) At genome. 3 (including ment efe al. et Seif B C AD Y. enterocolitica ial,w bevdfu ntne nwihtepeitdaux- predicted the which in instances four observed we Finally, MC4100

K. pneumoniae E. coli S. enterica S. dysenteriae Small scalemutations Multi-gene deletion Single genedeletion L201 L491 DH10B C3026 Y. aleksiciae DHB4 S. enterica S. enterica T000240 and Y. arohdei S. sonnei icnaxtoh u okonls-ffnto uain in mutations loss-of-function known to due auxotrophy Niacin (A) complexity. of levels various spans auxotrophy nutrient for basis genetic The S. boydii chromosome Y. pestis panK dgoT S. enterica U288 E. coli n ee ptemo h eee frag- deleted the of upstream genes 9 and proB, .coli E. rluA h h h dgoD iAdo insH dgoD birA rapA rapA rapA rapA tanR1( eiaieo -2,including K-12), of derivative (a RR1 strain No. strains ilvC pepN murB fruR fruR fruR leuD plasmid hisG hisG lysR mraZ birA .A h ou fdeletion of locus the At 5E). Fig. and and h h 221 bp SNP -A111V( SNP -P219L( SNP -D218N( SNP -C128Y( SNP -A28V( Insertion ( Insertion Insertion ( Insertion Deletion ( Deletion ( 1404 bp leuC hisD ilvA panK mraZ mraZ a oae immediately located was potA) hisC n =21 leuB n hisC ilvD =246 nadB nadA nadB nadA nadB nadA nadA nadA nadB leuA ) ) ilvE ) ) ) ) ) ) )

argB E. coli E. coli S. enterica E. coli 0112-791 0211-109 C321.deltA E ER3440 NEB 5α BW25113 esnafred- Yersinia HST04 fruR argC RR1 IAI1 Combined deletionandhomologousrecombination S50 Prophage inductionmediateddeletion .coli E. pepD. h mmuP mmuP rpsA rpsA rHpp p frsA gpt pepD prfH prfH and ybhC ybhC ftsH ftsH eDpf rtcB prfH pepD ihfB ihfB h h ybhB eriksenii aae,adfudta oa f4 tan eludraprob- (includ- our a auxotrophs in under predicted fell value were strains six each 41 those, to ing of Of a equal total 5%. of or of a ability that than probability found the less the and be using dataset, calculated to col- distribution We length we value distri- genome approach. genus, observed extreme the maxima each generalized fit For block a and genus. to length same of sequence bution the strains’ result the of a lected strains respect predicted as with other the length sequence timescale to whether genomic reduced evolutionary asked a had We losses small auxotrophs (60). gene a bottlenecks massive on population and symbiosis, occur toward can evolution and tion these of history evolutionary the regions. retrace chromosomal to necessary speciation be after would occurring events 5 evolutionary (Fig. of are cases result these a in CEGs likely of protein absence The hypothetical biosynthesis). isoleucine a carries families) kristensenii Yersinia oee,aFse’ xc etrvasta hr sn signifi- no strains is of (P adaptation. population there genomes niche the reduced that among with of reveals auxotrophs of test result enrichment exact strains cant a Fisher’s these as a that However, auxotrophy hypothesis the developed supporting have further 9-65), G5, str. K. yzusk-4, and iegn n nin vltoaytaetr cosspecies. across trajectory evolutionary ancient and Divergent ) insH insH dgoD comEC eoesralnn sotnascae ihnceadapta- niche with associated often is streamlining Genome Phage genome(n=15) Phage genome(n=25) rec2 pyrC .michiganensis K. bioA dgoT Phage gifsy 2(n=56) F K. pneumoniae K. Y. frederiksen pepD F ,adalre ubro etnn eoi sequences genomic pertinent of number larger a and ), n =152 carries proA bioB n =22 Y. kristensenii crl blaC prfH proB MGE bioF n =16ORFs pdxT n =18ORFs rtcB FDAARGOS_417 PNAS phoE KP11 wihsae h ags ubro gene of number largest the shares (which tanRC10, strain .enterica S. fabH prfH ATCC33639 and pepN crl | yafP K. michiganensis K. n =2,251genes setA n =334 ac 7 2020 17, March bioF au .) niaigta genome that indicating 0.6), = value potB MGE Divergent evolutionarytrajectory gadCB pepD pepT pyrD yafO phoE lysR .enterica S. potB rtcB t.01-9,and 0112-791, str. utgn eeinculdwith coupled deletion Multigene ) h yafN between yjjG yjjG .pneumoniae K. rBproA proB aroD yafP pyrC RC10 | pdxT ivlN tanT020 (C T000240. strain ydiM ilvNB o.117 vol. yafO yjjG K. G5 K. gadC ivlB pabC yafN aroE Deleted CEG Deleted gene Repeat region Insertion element setA h ivle in (involved and | gadB tan KP11 strains o 11 no. h setA fadH ykfl ykfl pepT .enterica S. h while lpp, lpp lpp Simple ) leuD leuD yafW yafW leuD | h 6269 L-

SYSTEMS BIOLOGY streamlining is not a predominant phenomenon across the six isoleucine, and L-arginine requirement in Y. ruckeri strains YRB genera. and NHV 3758. In addition, we found literature evidence for a biotin requirement in E. coli strain C321.∆A and its two Experimental Validation of Auxotrophies Highlight Technological genomically recoded derivatives (CP006698.1, CP010455.1, and Shortcomings at Multiple Levels. We proceeded to validate our CP010456.1) (71). Three predicted auxotrophs could grow nei- predictions experimentally by evaluating growth requirements ther in minimal medium nor in minimal medium supplemented of the predicted auxotrophs. We were able to obtain six E. with the corresponding predicted nutrient. For example, E. coli (61–64), three Yersinia (65), three Salmonella (66, 67), coli strain RR1 is a predicted L-proline auxotroph, S. dysen- three Shigella (68, 69), and one Klebsiella (70) strain (Fig. 6). teriae strain BU53M1 is a predicted niacin auxotroph, and Y. Out of 16 strains that we experimentally tested, 8 (4 E. coli, aleksiciae strain 159 was predicted to have multiple auxotro- 2 Y. ruckeri, 1 S. flexneri, and 1 S. sonnei) grew only when phies. However, neither exhibited any growth upon supple- glucose + M9 media was supplemented with the predicted mentation, suggesting that they may have additional nutrient essential nutrient(s). The confirmed auxotrophies included 1) requirements. an L-proline requirement in E. coli strain HST04; 2) an L- Notably, we tested growth of two Y. ruckeri strains on a leucine requirement in E. coli strains DHB4 and DH10B; 3) reduced chemically defined medium. While there is no precedent a niacin requirement in E. coli strain SF-173, S. flexneri strain for such an effort for this species, a chemically defined medium 2457T, and S. sonnei strain 2015C-3794; and 4) an L-valine, L- was nonetheless derived for multiple clinical Y. enterocolitica

Missing Observatio Follow-up Genetic basis Strains Essential nutrients gene/SNPs n results E. coli strain SF-173 nadB (A28V) Niacin √ - S. flexneri 2a str. nadA Niacin √ - 2457T (A111V) nadA Pseudogenes S. sonnei 2015C- (Deletion (bp Niacin √ - 3794 = 31)) S. dysenteriae strain nadA Additional Niacin √* BU53M1 (P219L) auxotrophies Gene found K. pneumoniae Arginine AND argI,purA × through PCR strain KP11 Adenine Single gene primer loss Y. ruckeri strain argG Arginine √ - NHV_3758 Y. ruckeri YRB argG Arginine √ - Gene found K. pneumoniae pyrI,pyrB Orotate C5H3N2O4 × through PCR strain KP11 primer Y. ruckeri strain ilvB, L-Valine AND L- √ - NHV_3758 YPO4089 Isoleucine ilvB, L-Valine AND L- Y. ruckeri YRB √ - YPO4089 Isoleucine Cytosine AND Partial Cytidine AND operon loss Y. aleksiciae strain Deoxyinosine AND Additional pyrF, prsA √* 159 L-Histidine AND auxotrophies NMN AND L- Tryptophan S. enterica serovar Gene found Enteritidis str. thrB L-Threonine × through PCR EC20100101 primer S. enterica serovar Gene found Enteritidis str. nadA Niacin × through PCR SA20094177 primer E. coli K-12 strain K- leuACD L-Leucine × Unknown 12 C3026 E. coli K-12 strain K- leuACD L-Leucine √ - 12 DHB4 E. coli str. K-12 leuACD L-Leucine √ - substr. DH10B E. coli strain HST04 proAB L-proline √ - Additional E. coli strain RR1 proAB L-proline √* auxotrophies - Full operon Uridine AND L- loss Valine AND L- trpA,trpB,trp Isoleucine AND L- C, Y. aleksiciae strain Tryptophan AND Additional trpD,trpE,ilvB √* 159 NMN AND auxotrophies , Guanosine AND L- ilvN Histidine AND 2',3'- Cyclic CMP S. enterica serovar Hypoxanthine AND Gene found Bovismorbificans str. purDH Adenosine AND L- × through blastn 3114 Histidine

Fig. 6. Results of in-house experimental validations for nutrient requirements across 16 Gram-negative strains and outcome of follow-up experiments and analysis for failure cases. Growth curves, PCR primers, and details regarding the list of strains can be found in SI Appendix. An asterisk (*) denotes that these strains couldn’t grow upon predicted nutrient supplementation, suggesting that they have additional nutrient requirements. Additionally, we found literature evidence for a biotin requirement in E. coli strain C321.∆A and its two genomically recoded derivatives. Note that one strain may have multiple genetic basis of auxotrophy, for example, Yersinia strains. (71).

6270 | www.pnas.org/cgi/doi/10.1073/pnas.1910499117 Seif et al. Downloaded by guest on September 30, 2021 Downloaded by guest on September 30, 2021 efe al. et Seif htaxtohe htaebnfiili n niomn are environment one in beneficial are that auxotrophies that (includ- vivo in niches glutamine, various in advantage ing fitness (including a provide interactions to host–pathogen in previously involved we nutrients BCAAs, Indeed, be multiple niche. to for immediate found its auxotrophies the in toward specific nutrient directed predict key pressure likely a selection auxotrophy of of utilization that result not a and as does prototrophic, the developed that view was indicates a which ancestor phylogeny such strain’s strain’s the However, relying account still into sources. take while nutrient auxotrophs external envi- specific nutritional on to across respect more grow with to a ability ronments have their in should flexibility auxotrophs relaxed Nonspecific auxotrophies nonspecific. nutrient predicted of were nonspecific 38% from only Surprisingly, specific dependencies. distinguishing predictions , and our of 16 verify observations. between inconsistencies and under- for predictions further basis the the We identify explore and experimentally basis. 3) and genetic basis, lying metabolic corresponding identify 2) the auxotrophies, predict genomic computationally 1) complete to available sequences publicly recon- quality assured 1,305 using and controlled/quality metabolism definitions of networks pathway genome-scale structed bypasses and which (AuxoFind) thresholds algorithm an user-defined devise we study, this In file assembly S2. the Discussion Dataset in in shown CEG are missing results the The BLASTn. for check via search control a quality were of these one of sequencing consisting added result a of subsequently As BLASTn. we origin via observations, found the be only near could and located absent, been have should intermedi- and enterica error S. for and observed growth trial reconstruction ate potential (experimental 5) gaps and knowledge 3114), verified regions, (e.g., deletion strain sequencing the in Bovismorbificans of of origin regions analysis the manual at deletion through missing the genes with of bly analysis (ver- manual enterica assemblies erroneous through 3) ified gene- SA20094177), by and (identified EC20100101 quality sequencing primers, low specific localized directly assembly, 2) BLASTn 3114), sequence running the wrong by technological corrected 1) on was including and (which levels, predictions annotation multiple erroneous at four shortcomings remaining the thresholding conservative to due D’Souza cases by few a suggested Appendix, miss (SI may approach (8) the al. et absence that the confirm in predictions both obtained in to growth arise of intermediate seem to with of strains limitations isoleucine, absence growth Conse- both the in severe pathway. since strains found biosynthetic added. surprise, we R-pantothenate also a quently, intact was could as an alone R-pantothenate came the carry nutrients unless result from six growth This all over with support carrying M9 not three of supplementation first the strain- for the reconstruction acids: Our with reference amino 74). L-proline, six (73, for and citrate) auxotrophy an and L-phenylalanine, predicted vitamins, models for 3 specific another acids, and amino (72), histidine) included (which epeitaxtohe o eea mn cd,nucleotides, acids, amino several for auxotrophies predict We between links highlighted experiments and analyses Follow-up L-valine, L-histidine, e.Bvsobfiassri 14,4 rnae assem- truncated 4) 3114), strain Bovismorbificans ser. tytpa,nai,adttahoae,o hc seem which or tetrathionate), and niacin, L-tryptophan, tan31 a ihqaiyasml,gnsthat genes assembly, high-quality a had 3114 strain agnn,and L-arginine, csen,or L-cysteine, IText). SI L-cysteine/tetrathionate, .pneumoniae K. L-methionine, mtinn,Lguaae lcn,and glycine, L-glutamate, L-methionine, .enterica S. .ruckeri Y. mtinn,Rpnohnt,and R-pantothenate, L-methionine, agnn.Teevldtdmodeling validated These L-arginine. luie.Srknl,w observe we Strikingly, L-leucine). .pestis Y. L-cysteine, tanKP11, strain e.Bvsobfiasstrain Bovismorbificans ser. tan) npriua,while particular, In strains). .pestis Y. tanC9.However, CO92. strain tytpa,niacin, L-tryptophan, L-arginine, .enterica S. tan wt 12 (with strains .enterica S. L-valine, strains ser. S. L- L- L- iessesbooyefr ie tpeitn n under- of and models predicting mechanistic at using aimed auxotrophies effort nutrient biology standing systems sive for hypotheses obser- (76). testable studies experimental generate follow-up and contradictions predictions These incon- silico mutations vations. of in function sources between of additional sistency constituted loss strains, in-frame technological three of unknown affecting biosynthesis of these acid positives. presence amino overcome false the related in and to gaps true closely between Knowledge used of distinguish that level) and be assem- shortcomings observed ORF can We genome the annotation. incorrect prototrophs (at genome 2) alignment erroneous genome, pangenome 3) the sequenc- and uneven is across bly, 1) genes by quality sequences, deletions/missing ing became complete of in sequence call- identification even genomic hampered, behind the a challenges and from apparent, technological The absent highlight levels. genes/functions of to ing multiple growth served strains at support latter could shortcomings The media were mutants. strains minimal 5 11 that that observed but and auxotrophs, strains 16 in social requirements ent bacterial highly full and and and permanent space deletion more colonization gene be network. strain’s (75). to full the likely SNPs their as constrain are as when such could removal such conditions They events operon variations right dataset. major small our the However, constitutes in under basis (3.8%) reversed genetic rare (one be are deletions perhaps 9-65. ORFs) large-scale strain more from A both or arising Paratyphi for serovar auxotrophies case niche and Overall, the 0112-791 are strain indeed they had Newport beta-lactam is that auxotrophs conferring this predicted suggesting genes 54 adapted; genomes, the of smaller of cluster 6 significantly due a particular, CEGs In of four move- losing resistance. insertion strain sequence the one accompa- insertion with to and/or genome, and/or mediated the anal- insertion across likely pangenome ment prophage was a CEGs by in of nied loss alleles the of which number of single largest ysis a the of have case affected to event only multigene deletion the large gene homologous Interestingly, to extensive events. mutation with recombination coupled function deletions of polymor- multioperon loss nucleotide and single a from causing ranging phism auxotrophy, for basis of function a as reactome. pathway full same strain’s the a in vary participating and strains CEGs loss across across differently to paralogs leading that affect may suggest development function therefore of path- auxotrophy We CEG. for alternative the pressures of of selective that position to the respect with on ways depending deletion, specificity upon simulated different the confer in pathways participating biosynthetic CEGs aux- same two nonspecific Additionally, a other. but the species in can one otrophy fashion. strains in different auxotrophy strain-specific two specific in a a function confer same in the out varied carrying pathway CEGs or therefore functional 3) and and activity, redundancy, enzymatic promiscu- protein’s the 2) a pathway, of metabolic ity the of structure entire on the depends 1) auxotrophies of multiplicity) and ficity/nonspecificity development. auxotrophy in nutritional role context-specific a plays the likely that background suggest availability and nutritional niches in between differences reflect variations hypothesize bladder these We stages. that of across varies biotin) stages and benefits (L-leucine multiple others fitness as over the (such carries infection while auxotrophies addition, some In of another. in detrimental loehr u eut osiuetems comprehen- most the constitute results our Altogether, nutri- for predictions our verified experimentally we Finally, genetic the of complexity the in continuity a observed We speci- (including basis metabolic the that found We .coli E. tan 3) hr eemlil ntne in instances multiple were There (36). strains PNAS L-arginine, | ac 7 2020 17, March eewihwsobserved was which gene a hisD, csen,adtimn,ta of that thiamin), and L-cysteine, | o.117 vol. .enterica S. and ruckeri , Y. | o 11 no. serovar | 6271

SYSTEMS BIOLOGY metabolism. The approach developed can be applied to quickly Data Availability. All data generated in this study are included in this pub- and systematically predict nutrient requirements from genomic lished article (and in SI Appendix and Datasets S1–S6). AuxoFind is available sequences. on GitHub, with an example notebook. All genomic sequences analyzed in this study are publicly available on PATRIC (38), and accession numbers are Materials and Methods available in Dataset S1. The specific procedure of data collection and quality control, prediction of CEGs,andhomologousgeneidentificationisdescribedin SI Appendix, ACKNOWLEDGMENTS. We thank Dr. Karsten Zengler and Marc Abrams for reviewing the manuscript and providing constructive suggestions. This work SI Materials and Methods. The detailed workflow for prediction of nutrient was supported by NIH Grants AI124316 and GM057089, and Novo Nordisk auxotrophy is described in SI Appendix, SI Materials and Methods. Deter- Foundation Grant NNF10CC1016517. We are grateful to Drs. Rebecca mination of gene neighborhood and synteny is described in SI Appendix, SI Lindsey, Nancy Strockbine, Shi Chen, Sang Jun Lee, Dana Boyd, Mehmet Materials and Methods. Sixteen strains were tested as a part of this study. Berkmen, Henning Sørum, David Rozak, Shannon Lyn Johnson, Craig The experimental validation methods and conditions are described in SI Winstanely, Roger Johnson, and Weihua Huang for generously providing Appendix, SI Materials and Methods. bacterial strains for this study.

1. B. K. Low, Auxotroph in Encyclopedia of , S. Brenner, J. H. Miller, Eds. 29. E. S. Kavvas et al., Updated and standardized genome-scale reconstruction of (Academic, New York, NY, 2001). mycobacterium tuberculosis H37Rv, iEK1011, simulates flux states indicative of 2. G. Agarwal, A. Kapil, S. K. Kabra, B. K. Das, S. N. Dwivedi, Characterization of Pseu- physiological conditions. BMC Syst. Biol. 12, 25 (2018). domonas aeruginosa isolated from chronically infected children with cystic fibrosis in 30. C. J. Norsigian, E. Kavvas, Y. Seif, B. O. Palsson, J. M. Monk, iCN718, an updated India. BMC Microbiol. 5, 43 (2005). and improved genome-scale metabolic network reconstruction of Acinetobacter 3. A. L. Barth, T. L. Pitt, Auxotrophic variants of Pseudomonas aeruginosa are selected baumannii AYE. Front. Genet. 9, 121 (2018). from prototrophic wild-type strains in respiratory in patients with cystic 31. E. J. O’Brien, J. M. Monk, B. O. Palsson, Using genome-scale models to predict fibrosis. J. Clin. Microbiol. 33, 37–40 (1995). biological capabilities. Cell 161, 971–987 (2015). 4. A. L. Barth, N. Woodford, T. L. Pitt, Complementation of methionine auxotrophs of 32. J. D. Orth, I. Thiele, B. Ø. Palsson, What is flux balance analysis? Nat. Biotechnol. 28, Pseudomonas aeruginosa from cystic fibrosis. Curr. Microbiol. 36, 190–195 (1998). 245–248 (2010). 5. A. L. Barth, T. L. Pitt, The high amino-acid content of sputum from cystic fibro- 33. E. Bosi et al., Comparative genome-scale modelling of Staphylococcus aureus strains sis patients promotes growth of auxotrophic Pseudomonas aeruginosa. J. Med. identifies strain-specific metabolic capabilities linked to pathogenicity. Proc. Natl. Microbiol. 45, 110–119 (1996). Acad. Sci. U.S.A. 113, E3801–E3809 (2016). 6. X. J. Yu, D. H. Walker, Y. Liu, L. Zhang, Amino acid biosynthesis deficiency in bacteria 34. J. M. Monk et al., Genome-scale metabolic reconstructions of multiple Escherichia coli associated with human and animal hosts. Infect. Genet. Evol. 9, 514–517 (2009). strains highlight strain-specific adaptations to nutritional environments. Proc. Natl. 7. S. J. Giovannoni, J. Cameron Thrash, B. Temperton, Implications of streamlining Acad. Sci. U.S.A. 110, 20338–20343 (2013). theory for microbial ecology. ISME J. 8, 1553–1565 (2014). 35. Z. A. King et al., BiGG models: A platform for integrating, standardizing and sharing 8. G. D’Souza et al., Less is more: Selective advantages can explain the prevalent loss of genome-scale models. Nucleic Acids Res. 44, D515–D522 (2016). biosynthetic genes in bacteria. Evolution 68, 2559–2570 (2014). 36. J. M. Monk et al., iML1515, a knowledgebase that computes Escherichia coli traits. 9. M. Embree, J. K. Liu, M. M. Al-Bassam, K. Zengler, Networks of energetic and Nat. Biotechnol. 35, 904–908 (2017). metabolic interactions define dynamics in microbial communities. Proc. Natl. Acad. 37. J. Schellenberger, J. O. Park, T. M. Conrad, B. Ø. Palsson, BiGG: A biochemical genetic Sci. U.S.A. 112, 15450–15455 (2015). and genomic knowledgebase of large scale metabolic reconstructions. BMC Bioinf. 10. J. Lv, S. Wang, Y. Wang, Y. Huang, X. Chen, Isolation and molecular identification of 11, 213 (2010). auxotrophic mutants to develop a genetic manipulation system for the haloarchaeon 38. A. R. Wattam et al., PATRIC, the bacterial bioinformatics database and analysis Natrinema sp. J7-2. Archaea 2015, 483194 (2015). resource. Nucleic Acids Res. 42, D581–D591 (2014). 11. J. T. Pronk, Auxotrophic strains in fundamental and applied research. Appl. 39. J. A. Bartell et al., Reconstruction of the metabolic network of Pseudomonas Environ. Microbiol. 68, 2095–2100 (2002). aeruginosa to interrogate virulence factor synthesis. Nat. Commun. 8, 14631 (2017). 12. M. Ulfstedt, G. Z. Hu, M. Johansson, H. Ronne, Testing of auxotrophic selection mark- 40. M. Brenner, L. Lobel, I. Borovok, N. Sigal, A. A. Herskovits, Controlled branched-chain ers for use in the moss physcomitrella provides new insights into the mechanisms of amino acids auxotrophy in listeria monocytogenes allows isoleucine to serve as a host targeted recombination. Front. Plant Sci. 8, 1850 (2017). signal and virulence effector. PLoS Genet. 14, e1007283 (2018). 13. V. M. Boer, S. Amini, D. Botstein, Influence of genotype and nutrition on survival and 41. I. Tattoli et al., Amino acid starvation induced by invasive bacterial pathogens triggers metabolism of starving yeast. Proc. Natl. Acad. Sci. U.S.A. 105, 6930–6935 (2008). an innate host defense program. Cell Host Microbe 11, 563–575 (2012). 14. O. Bouvet, E. Bourdelier, J. Glodt, O. Clermont, E. Denamur, Diversity of the aux- 42. W. Ren et al., Amino acids as mediators of metabolic cross talk between host and otrophic requirements in natural isolates of Escherichia coli. 163, pathogen. Front. Immunol. 9, 319 (2018). 891–899 (2017). 43. M. L. Di Martino et al., Molecular evolution of the nicotinic acid require- 15. Y. Seif et al., Genome-scale metabolic reconstructions of multiple salmonella strains ment within the Shigella/EIEC pathotype. Int. J. Med. Microbiol. 303, 651–661 reveal serovar-specific metabolic traits. Nat. Commun. 9, 3771 (2018). (2013). 16. Y. Seif, J. M. Monk, H. Machado, E. Kavvas, B. O. Palsson, Systems biology and 44. S. E. Winter et al., Gut inflammation provides a respiratory electron acceptor for pangenome of Salmonella O-antigens. MBio 10, e01247-19 (2019). Salmonella. Nature 467, 426–429 (2010). 17. X. Fang et al., Escherichia coli B2 strains prevalent in inflammatory bowel disease 45. T. J. Treangen, B. D. Ondov, S. Koren, A. M. Phillippy, The harvest suite for rapid core- patients have distinct metabolic capabilities that enable colonization of intestinal genome alignment and visualization of thousands of intraspecific microbial genomes. mucosa. BMC Syst. Biol. 12, 66 (2018). Genome Biol. 15, 524 (2014). 18. C. J. Lloyd et al., Model-driven design and evolution of non-trivial synthetic syntrophic 46. R. R. Chaudhuri et al., Comprehensive identification of serovar pairs. bioRXiv:10.1101/327270 (21 May 2018). typhimurium genes required for infection of BALB/c mice. PLoS Pathog. 5, e1000529 19. M. T. Mee, H. H. Wang, Engineering ecosystems and synthetic ecologies. Mol. Biosyst. (2009). 8, 2470–2483 (2012). 47. R. R. Chaudhuri et al., Comprehensive assignment of roles for Salmonella 20. M. P. Cabral et al., Design of live attenuated bacterial vaccines based on D-glutamate typhimurium genes in intestinal colonization of food-producing animals. PLoS Genet. auxotrophy. Nat. Commun. 8, 15480 (2017). 9, e1003456 (2013). 21. S. K. Hoiseth, B. A. Stocker, Aromatic-dependent salmonella typhimurium 48. A. J. Grant et al., Genes required for the fitness of Salmonella enterica serovar are non-virulent and effective as live vaccines. Nature 291, 238–239 typhimurium during infection of immunodeficient gp91-/- phox mice. Infect. Immun. (1981). 84, 989–997 (2016). 22. R. M. Hoffman, Tumor-targeting amino acid auxotrophic Salmonella typhimurium. 49. D. G. Mediati,“Identification of Escherichia coli genes required for bacterial survival Amino Acids 37, 509–521 (2009). and morphological plasticity in urinary tract infections,” PhD thesis, University of 23. P. C. Juliao, C. F. Marrs, J. Xie, J. R. Gilsdorf, Histidine auxotrophy in commensal and Technology Sydney, Sydney, Australia (2018). disease-causing nontypeable influenzae. J. Bacteriol. 189, 4994–5001 50. M. D. Phan et al., The serum resistome of a globally disseminated multidrug resistant (2007). uropathogenic Escherichia coli clone. PLoS Genet. 9, e1003834 (2013). 24. D. E. Vaccaro, Symbiosis therapy: The potential of using human for 51. P. Vohra et al., Retrospective application of transposon-directed insertion-site molecular therapy. Mol. Ther. 2, 535–538 (2000). sequencing to investigate niche-specific virulence of Salmonella Typhimurium in 25. M. T. Mee, J. J. Collins, G. M. Church, H. H. Wang, Syntrophic exchange in synthetic cattle. BMC Genom. 20, 20 (2019). microbial communities. Proc. Natl. Acad. Sci. U.S.A. 111, E2149–E2156 (2014). 52. S. L. Foley, T. J. Johnson, S. C. Ricke, R. Nayak, J. Danzeisen, Salmonella pathogenicity 26. M. N. Price et al., Filling gaps in bacterial amino acid biosynthesis pathways with high- and host adaptation in chicken-associated serovars. Microbiol. Mol. Biol. Rev. 77, 582– throughput genetics. PLoS Genet. 14, e1007147 (2018). 607 (2013). 27. I. Thiele, B. Ø. Palsson, A protocol for generating a high-quality genome-scale 53. Y. Hilliam et al., Pseudomonas aeruginosa adaptation and diversification in the non- metabolic reconstruction. Nat. Protoc. 5, 93–121 (2010). cystic fibrosis bronchiectasis lung. Eur. Respir. J. 49, 1602108 (2017). 28. Y. Seif et al., A computational knowledge-base elucidates the response of Staphy- 54. R. La Rosa, H. K. Johansen, S. Molin, Convergent metabolic specialization through lococcus aureus to different media types. PLoS Comput. Biol. 15, e1006644 distinct evolutionary paths in Pseudomonas aeruginosa. mBio, 10.1128/mBio.00269- (2019). 18. (2018).

6272 | www.pnas.org/cgi/doi/10.1073/pnas.1910499117 Seif et al. Downloaded by guest on September 30, 2021 Downloaded by guest on September 30, 2021 6 .Bronowski C. 66. paralo- two of structure of repeat The structure Linke, genome D. Gulla, hybrid S. the Leo, C. Unveiling J. Lee, Ottoni, C. J. Wrobel, S. A. Kim, J. 65. H. Sim, M. Y. Jeong, H. 64. topology. Chen protein C. membrane of 63. Determinants Beckwith, J. Manoil, C. Boyd, D. 62. of sequence genome Complete Berkmen, M. Raleigh, A. E. Fomenkov, insertion A. Anton, the P. B. of 61. Distribution Ohtsubo, E. Nilsson I. Ohtsubo, A. search H. 60. phage Nakamura, fast A K. PHAST: Wishart, Nyman, S. D. K. Dennis, J. 59. J. Lynch, H. K. Liang, Y. Zhou, Y. 58. B J. A. Nuccio, P. S. 57. Lin of M. isolates 56. Natural Roth, R. J. Bergthorsson, U. 55. efe al. et Seif nbceilgenomes. bacterial in el rp Dis. Trop. Negl. enterica pathogen. fish a of capacities adhesive of Biol. evolution the on light sheds (yrIlm) genes, gous coli Escherichia U.S.A. Sci. Acad. Natl. Announc. engineered the U.S.A. Sci. Acad. Natl. bacteria. Gram-negative in IS1 sequence tool. gut. inflamed the in growth (2014). escalating for network metabolic genomes. single a carry 7–8 (2018). 171–183 201, uli cd Res. Acids Nucleic fet fsotidl npoensrcueadfnto nhuman in function and structure protein on indels short of Effects al., et ovrec fDAmtyainadpopoohoto epigenetics phosphorothioation and methylation DNA of Convergence al., et subspecies 0201 (2016). e00230-16 4, c.Rep. Sci. atra eoesz euto yeprmna evolution. experimental by reduction size genome Bacterial al., et esnaruckeri Yersinia nadA eoi hrceiaino naienon-typhoidal invasive of characterisation Genomic al., et R H11RecA+). (HB101 RR1 25 (2013). e2557 7, shrci coli Escherichia ulr oprtv nlssof analysis Comparative aumler, ¨ 33(2017). 9313 7, isnemutation. missense enterica 5582 (1987). 8525–8529 84, (2005). 12112–12116 102, rc al cd c.U.S.A. Sci. Acad. Natl. Proc. 37W5 (2011). W347–W352 39, nai (yrInv invasin eoa oimricn sltsfo Malawi. from isolates bovismorbificans serovar Hfesrisadterwl-yeparents. wild-type their and strains SHuffle rn.Microbiol. Front. .Bacteriol. J. Nature .ruckeri “Y. a and ) 0–1 (1981). 609–612 289, amnlaenterica Salmonella 5140 (2017). 4501–4506 114, 0–0 (2005). 400–403 187, Salmonella 8 (2017). 585 8, nai-iemolecule”, invasin-like eoe dnie a identifies genomes MBio eoa Dublin serovar e00929–14 5, Salmonella Genome .Struct. J. Proc. Proc. PLoS 1 .M Wannier M. T. 71. 4 .M olr .R rbkr hsooia ai ftelwcal- low the of basis Physiological Brubaker, R. R. Fowler, M. of J. Nutrition 74. Wessman, E. G. Brownlow, heat- J. W. of synthesis 73. for requirements Nutritional Robertson, C. D. Amirmozafari, N. 72. Huang W. 70. Kim J. 69. Lindsey L. R. 68. Labb G. 67. 6 .Mn,J oae,B .Plsn piiiggnm-cl ewr reconstructions. network genome-scale Optimizing Palsson, O. coli B. Nogales, Escherichia J. Monk, of J. auxotroph Leub 76. a in rates Reversion Minnick, F. M. Wright, E. B. 75. al cd c.U.S.A. Sci. Acad. Natl. Chemother. Agents moniae imrsos in C. 38 response to 36 cium of temperatures at (1960). media defined (1993). by enterotoxin stable sequencing. PacBio with generated from maps. whole-genome Heidelberg and (2017). outbreak-associated serovar sequencing sources. PacBio or enterica food resistant subsp. and enterica animal, (2016). Salmonella human, of lates a.Biotechnol. Nat. growth. (1997). exponential 854 during levels ppGpp with correlate K-12 (1994). ihqaiywoegnm eune o 9historical 59 for sequences whole-genome High-quality al., et ihboth with e ´ mrec n vlto fmultidrug-resistant of evolution and Emergence al., et opeegnm eune f1 aainiso- Canadian 17 of sequences genome Complete al., et tal. et ihqaiydatgnm eune o ordrug- four for sequences genome draft High-quality al., et 4–5 (2014). 447–452 32, bla dpieeouino eoial recoded genomically of evolution Adaptive , 0061 (2017). e00076-17 61, esnaenterocolitica. Yersinia 0039 (2018). 3090–3095 115, KPC PNAS esnapestis. Yersinia and | bla ac 7 2020 17, March CTX-M eoeAnnounc. Genome hglasonnei Shigella nertdi h chromosome. the in integrated pl nio.Microbiol. Environ. Appl. eoeAnnounc. Genome net Immun. Infect. eoeAnnounc. Genome atuel pestis Pasteurella | o.117 vol. 0221 (2018). e00282-18 6, .Bacteriol. J. tan eeae with generated strains Microbiology shrci coli Escherichia | 5234–5241 62, o 11 no. lbilapneu- Klebsiella Shigella e00990-16 4, 3314–3320 59, nchemically in e00906-17 5, 299–304 79, Antimicrob. 847– 143, | strains . 6273 Proc.

SYSTEMS BIOLOGY