Downloaded by guest on October 2, 2021 oeso elsuidlc,t hrceiesnhtcpromoters biophysical synthetic infer characterize to to loci, used well-studied been of have models technologies These meta- and zoans. yeast, , of in architecture sequences functional regulatory transcriptional the dissecting for developed been have -specific of readily purification proteins. not regulatory require are often methods these and (12)], pro- parallelized Busby and dissecting the Minchin in by for [reviewed and work promoters developed bacterial genetic individual at classic been function moter now has DNA-bound of methods between modulate arsenal biochemical interactions (RNAP) an polymerase which Although RNA . in and way factors transcription the binding factor or transcription nucleotide- functional sites the all either of revealing location in of resolution regulation incapable gene are methods study increas- these to are used techniques being high-throughput immunoprecipita- ingly other Chromatin and others. (ChIP) and tion bacteria these in ture (3–5). mechanisms regulatory or as such a Moradian Annie bacteria Belliveau M. in Nathan regulation transcriptional of molecular mechanisms the dissecting for approach Systematic www.pnas.org/cgi/doi/10.1073/pnas.1722055115 Fig. Appendix, (SI regulated are genes S1 how the or of if one-half indication than no have more still we understood, Even best is eukaryotes. regulation ignorance to archaea This of to regulated. bacteria in are understanding to genomes viruses sequence-level these from extends in a genes outpacing the how far is genomes T chromatography affinity DNA regulation gene of mechanisms the in dissecting function quantitatively promoter of possibility the opens it up extract here, presented approach further the of generality the to Given relation-ships. us input–output of describing allowed models models biophysical approach quantitative previously nucleotide-resolution the including recover ones, promoters, unannotated some we For cases, mechanism. promoter all bacterium enteric In the coli. previously in and well-studied promoters both uncharacterized on systematic approach a this in show promoters We bacterial way. multiple modeling dissect to reporter information-theoretic used be parallel and can massively spectrometry, of mass combination a assays, dissec- how multipromoter toward show step and first tion a take we regulatory Here, low-throughput individual methods. using efforts which focused requires by character- operate mechanisms present, sequences of At all molecular regulated. almost the are how genomes izing about these in ignorant genes remain continues the we genomes biol- rapidly, bacterial in expand of processes 2017) to 19, catalog ubiquitous December the most review while for the However, (received of 2018 ogy. one 6, April is approved regulation and NJ, Gene Princeton, University, Princeton Jr., Callan G. Curtis by and Edited 11724; NY Harbor, Spring Cold Laboratory, Harbor Spring Cold d 91125; CA Pasadena, Technology, iiino ilg n ilgclEgneig aionaIsiueo ehooy aaea A91125; CA Pasadena, Technology, of Institute California Engineering, Biological and Biology of Division rtoeEpoainLbrtr,BcmnIsiue aionaIsiueo ehooy aaea A91125; CA Pasadena, Technology, of Institute California Institute, Beckman Laboratory, Exploration Proteome nrcn er,avreyo asvl aallrpre assays reporter parallel massively of variety a years, recent In architec- regulatory studying for needed are approaches New RglnB()o cCc().I te oe bacteria, model other In (2)]. EcoCyc or (1) [RegulonDB ) h oe raimi hc transcriptional which in model the coli, Escherichia a ee ee aeestablished have genes fewer far aeruginosa, Pseudomonas esqecn eouinhslf niswk nenor- an wake its sequenced in of catalog left expanding rapidly has the challenge: revolution mous sequencing he ailssubtilis , Bacillus | asvl aallrpre assay reporter parallel massively d oj Hess Sonja , .coli E. a tpai .Barnes L. Stephanie , albce crescentus, Caulobacter | n ierneo te bacteria. other of range wide a and asspectrometry mass c eateto eladMlclrBooy cec o ieLbrtr,UpaaUiest,712 psl,Sweden; Uppsala, 24 751 University, Uppsala Laboratory, Life for Science Biology, Molecular and Cell of Department d,1 utnB Kinney B. Justin , | uniaiemodels quantitative .coli E. a ila .Ireland T. William , iroharveyii , Vibrio 61) but (6–11), e Escherichia n o Phillips Rob and , f eateto ple hsc,Clfri nttt fTcnlg,Psdn,C 91125 CA Pasadena, Technology, of Institute California Physics, Applied of Department | b hroRWms pcrmtyfie aebe eoie ntejOTReposi- jPOST the in deposited been have files spectrometry tory, mass RAW Thermo ilpooesa uloierslto sn obnto of massively combination A a measurements. biochemical using and resolution functional, genetic, nucleotide identifying at bacte- for uncharacterized promoters approach previously rial of systematic architecture functional a the describe and dissection established. been yet uncharacter- has previously to sequences regulatory of technologies ized no mechanisms reporter functional However, parallel the (20). massively decipher using cells for mammalian approach in – new interactions longer-range assays for identifying promoter for CRISPR search promise shown to (13–19). also have and sequences sites, regulatory binding transcriptional known from constructed esddt n yhncd soito ihdt nlssadpotn have plotting and analysis data with at association (available Zenodo code and PBoC/sortseq GitHub python on deposited and been data cessed from data BioSample, sequencing regulatory Seq 1073/pnas.1722055115/-/DCSupplemental. at online information supporting contains article This 2 1 Sequence NCBI the on deposited been have under files Archive, sequencing distributed Read Raw is deposition: Data article BY-NC-ND). (CC 4.0 access License NonCommercial-NoDerivatives open This Submission. Direct PNAS a is article provided This interest. J.B.K. of paper. conflict the and no wrote R.P. declare M.J.S., and authors W.T.I., J.B.K., The S.L.B., N.M.B., N.M.B., funding; software; acquired W.T.I., R.P. provided and N.M.B., S.H. J.B.K investigation; validation; provided and N.M.B. W.T.I., M.J.S., methodology; S.L.B., D.L.J., J.B.K. provided N.M.B., and conceptualization; J.B.K. A.M., provided and D.L.J., R.P. M.J.S., W.T.I., D.L.J., and S.L.B., N.M.B., S.L.B., N.M.B., research; N.M.B., performed data; research; J.B.K. analyzed designed and R.P. S.H., and A.M., D.L.J., M.J.S., N.M.B., contributions: Author owo orsodnesol eadesd mi:[email protected]. Email: addressed. be should correspondence whom To Gaithers- MedImmune, Engineering, 20878. Protein MD and burg, Discovery Antibody address: Present ailL Jones L. Daniel , n h ehnsso rmtrfnto nawd ag of range wide a in function promoter bacteria. of dissect- mechanisms quantitatively of the possibility ing the up opens it approach, unanno- and well-characterized in both promoters in tated shown in mecha- was results promoter and nism of approach models The pair-resolution chro- sequences. base use quantitative these bind then regulatory that the We proteins identify to sequences. spectrometry mass regulatory and be matography identify can reporter to modeling parallel information-theoretic used we massively and are a rapidly, Sort-Seq, genomes how these assay, expands show in we genomes genes Here, the regulated. of how How- about catalog environment. in ignorant or remain the state decisions cellular while regulatory in ever, change make a to constantly response must Significance ee etk rtse oadqatttv,multipromoter quantitative, toward step first a take we Here, https://repository.jpostdb.org a,b,f,2 https://www.ncbi.nlm.nih.gov/biosample belliveau https://www.ncbi.nlm.nih.gov/sra and c ihe .Sweredoski J. Michael , https://doi.org/10.5281/zenodo.1184169 ie h eeaiyo the of generality the Given coli. Escherichia b eateto hsc,Clfri nttt of Institute California Physics, of Department e acsinno. (accession iosCne o uniaieBiology, Quantitative for Center Simons shrci coli Escherichia acsinno. (accession PXD007892 www.pnas.org/lookup/suppl/doi:10. https://www.github.com/RPGroup- raieCmosAttribution- Commons Creative NSLts Articles Latest PNAS acsinno. (accession a endpstdi NCBI in deposited been has d , .Flscnann pro- containing Files ). SRP121362 respectively). , SAMN07830099). n Sort- and ) | f10 of 1

BIOPHYSICS AND COMPUTATIONAL BIOLOGY parallel reporter assay [Sort-Seq (13)] is performed on a pro- A moter in multiple growth conditions to identify functional tran- scription factor binding sites. DNA affinity chromatography and mass spectrometry (21, 22) are then used to identify the reg- ulatory proteins that recognize these sites. In this way, one is able to identify both the functional binding sites and cognate transcription factors in previously unstudied promoters. Subsequent massively parallel assays are then per- formed in gene deletion strains to provide additional validation B of the identified regulators. The reporter data thus generated are also used to infer sequence-dependent quantitative models of transcriptional regulation. In what follows, we first illustrate the overarching logic of our approach through application to four A

C previously annotated promoters: lacZYA, relBE, marRAB, and - G yebG. We then apply this strategy to the previously uncharacter- T ized promoters of purT, xylE, and dgoRKADT, showing the ability to go from regulatory ignorance to explicit quantitative models of a promoter’s input–output behavior. C Results To dissect how a promoter is regulated, we begin by perform- ing Sort-Seq (13). As shown in Fig. 1A, Sort-Seq works by first generating a library of cells, each of which contains a mutated promoter that drives expression of green fluorescent protein (GFP) from a low copy plasmid [5–10 copies per cell (23)] and provides a readout of transcriptional state. We use fluorescence- activated cell sorting (FACS) to sort cells into multiple bins gated Fig. 1. Overview of the approach to characterize transcriptional regulatory by their fluorescence level and then sequence the mutated plas- DNA using Sort-Seq and mass spectrometry. (A) Schematic of Sort-Seq. A mids from each bin. We found it sufficient to sort the libraries promoter plasmid library is placed upstream of GFP and is transformed into into four bins and generated datasets of about 0.5–2 million cells. The cells are sorted into four bins by FACS, and after regrowth, plas- sequences across the sorted bins (SI Appendix, Fig. S3 A–D). mids are purified and sequenced. The entire intergenic region associated To identify putative binding sites, we calculate “expression shift” with a promoter is included on the plasmid, and a separate downstream plots that show the average change in fluorescence when each ribosomal binding site sequence is used for translation of the GFP gene. position of the regulatory DNA is mutated (Fig. 1B, Left). Muta- The fluorescence histograms show the fluorescence from a library of the rel promoter and the resulting sorted bins. (B) Regulatory binding sites tions to the DNA will, in general, disrupt binding of transcription are identified by calculating the average expression shift due to mutation factors (24), and therefore, regions with a positive shift are sug- at each position. In the schematic, positive expression shifts are sugges- gestive of binding by a , while a negative shift suggests tive of binding by , while negative shifts would suggest binding binding by an or RNAP. by an activator or RNAP. Quantitative models can be inferred to describe The identified binding sites are further interrogated by per- and further interrogate the associated DNA–protein interactions. An exam- forming information-based modeling with the Sort-Seq data. ple energy matrix that describes the binding energy between an as yet Here, we generate energy matrix models (13, 25) that describe unknown transcription factor (TF) and the DNA is shown. By convention, the sequence-dependent energy of interaction of a transcription the wild-type nucleotides have zero energy, with blue squares identify- factor at each putative binding site. For each matrix, we use a ing mutations that enhance binding (negative energy) and red squares identifying mutations that reduce binding (positive energy). The wild-type convention that the wild-type sequence is set to have an energy sequence is written above the matrix. (C) DNA affinity chromatography and of zero (an example energy matrix is in Fig. 1B, Right). Mutations mass spectrometry are used to identify the putative transcription factor for that enhance binding are identified in blue in Fig. 1B, while muta- an identified repressor site. DNA oligonucleotides containing the target tions that weaken binding are identified in red in Fig. 1B. We binding site are tethered to magnetic beads and used to purify the tar- also use these energy matrices to generate sequence logos (26), get transcription factor from cell lysate. Protein abundance is determined which provide a useful visualization of the sequence specificity by mass spectrometry, and a protein enrichment is calculated as the ratio (Fig. 1B, above energy matrix). in abundance relative to a second reference experiment where the target To identify the putative transcription factors, we next perform sequence is mutated away. DNA affinity chromatography experiments using DNA oligo- nucleotides containing the binding sites identified by Sort-Seq. Here, we apply a stable isotopic labeling of cell culture [SILAC then allow us to probe the Sort-Seq data further through addi- (27–30)] approach, which enables us to perform a second refer- tional information-based modeling using thermodynamic models ence affinity chromatography that is simultaneously analyzed by of gene regulation. As further validation of binding by an iden- mass spectrometry. We perform chromatography using magnetic tified regulator, we also perform Sort-Seq experiments in gene beads with tethered oligonucleotides containing the putative deletion strains, which should no longer show the associated binding site (Fig. 1C). Our reference purification is performed positive or negative shift in expression at their binding site. identically, except that the binding site has been mutated away. The abundance of each protein is determined by mass spec- Sort-Seq Recovers the Regulatory Features of Well-Characterized trometry and used to calculate protein enrichment ratios, with Promoters. To first show Sort-Seq as a tool to discover regula- the target transcription factor expected to exhibit a ratio greater tory binding sites de novo, we began by looking at the promoters than one. The reference purification ensures that nonspecifically of lacZYA (lac), relBE (rel), and marRAB (mar). These promot- bound proteins will have a protein enrichment near one. This ers have been studied extensively (31–33) and provide a useful mass spectrometry data and the energy matrix models provide testbed of distinct regulatory motifs. To proceed, we constructed insight into the identity of each regulatory factor and poten- libraries for each promoter by mutating their known regulatory tial regulatory mechanisms. In certain instances, these insights binding sites (SI Appendix, Fig. S3 E and F shows additional

2 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1722055115 Belliveau et al. Downloaded by guest on October 2, 2021 Downloaded by guest on October 2, 2021 elva tal. et Belliveau hc ecnie ee n APrcpo rti (CRP) protein receptor cAMP a of two and sites, here, binding consider (LacI) we repressor the which Lac considering three contains by which begin We characterization). RNAP. and MarA RegulonDB. in for those shown on based are (repres- are logos MarR sites and sequence binding RNAP, Annotated and (activators), matrices MarA 30 and Energy Fis sor). at of LB sites matrices in binding energy known grown and were (C (repressor), these. cells for RelBE Here, shown and are RNAP logos sequence of and sites 37 binding at the glucose 0.5% tify for with shown media the are minimal of logos M9 Sort-Seq sequence (B) site. and and binding matrices (−10 each Energy RNAP noted. (activator), (repressor) CRP LacI for sites binding 37 tated at glucose 0.5% with media ( promoters. i.2. Fig. A C B hrceiaino h euaoylnsaeo the of landscape regulatory the of Characterization otSqo the of Sort-Seq A) lac ◦ .Epeso hfsaeson ihanno- with shown, are shifts Expression C. rmtr el eegoni 9minimal M9 in grown were Cells promoter. ◦ rel .Teepeso hfsietf the identify shifts expression The C. rmtr el eeas rw in grown also were Cells promoter. otSqo the of Sort-Seq ) ◦ .Teepeso hfsiden- shifts expression The C. 5sbie) and subsites), −35 lac lac, mar and rel, promoter, promoter. mar idtp tandt r u obnigb hs transcription these by binding to the due in features are factors. observed data (SI the strain strain that deletion wild-type the suggesting of in S7), inability data Fig. an the Appendix, and explain associ- to 3) shift matrices (Fig. expression energy repressors the the these of by loss binding a with in performing ated resulted this by case, data each In the our where validated to strains in which we Sort-Seq with instead, specificity Here, sequence compare. their characterized that able coefficient correlation Appendix). (SI (Pearson for 0.5–0.9) sites factor binding genomic transcription known each from in generated those shown with are well These of Fis. and sites and 2 binding MarA, Fig. RelBE, the energy CRP, for inferred LacI, logos of we RNAP, sequence features Here, associated quantitative factor. and the transcription matrices recover important each also was by could it binding we plot, that shift show expression to each in behavior necessarily ulatory not and here. assays considered vitro condition growth in the in under expected shown were computationally were or the factor, overexpres- predicted, However, transcription required associated (1). the annotations of AcrR these sion and with binding CpxR/CpxA, associated reported Cra, studies been CRP, also be native for have the may sites There including which present, site. site, features binding binding other ribosomal with its overlap along expres- to observed negative due and were binding positive shifts Both noted sion site apparent. the binding especially MarR with downstream not the consistent was that be was at exception to LB One seemed sites. in 2C) libraries (Fig. grew plot instead, and media 30 minimal quite M9 was fluorescence in promoter’s dim marbox the by that this repression found under We of is (33). MarR upstream it growth, binds laboratory standard which Under (38). Fis, and site) by binding enhanced marbox so-called further the (via Rob and SoxS, MarA, utdu eitneeflxpm,AcBTl,adincreases major and the The including ArcAB-TolC, (33). tolerance genes, pump, antibiotic of efflux variety resistance a multidrug factor activates transcription the which for codes MarA, its since resistance, expression. otic in shift negative the a a and showed RelBE, particular, for mainly In site site binding binding RelBE. the RNAP at and observed RNAP was 2B shift by Fig. positive mini- in binding shown M9 with are in shifts consistent grown expression cells The with media. mal Sort-Seq (37). promoter performed own of its a similarly of cleavage repressor contains We a also is through and protein antitoxin domain paralysis antitoxin binding the DNA the cellular partner, Interestingly, binding causes (36). cognate mRNA toxin its the of toxin, excess RelB, the in cell When is in (35). roles RelE, persistence important toxin–antitoxin cellular with 36 including about chromosome, physiology, of the 1 on is found It systems RelB. and RelE pair 34). antitoxin (31, site binding O1 weaker LacI magnitude the of of disso- orders that vivo three than in almost an is having that O3 constant these LacI ciation between with energies sequences, site binding binding different binding two LacI the two reflects the CRP likely at for magnitude binding sites shifts in each negative difference and The of RNAP. LacI role and for expres- shifts regulatory positive The expected showing plot. the site, the reflect above shifts noted The sion sites glucose. 2A, binding Fig. 0.5% in annotated shown with are with media position Sort-Seq nucleotide minimal each performed at M9 shifts we expression in Here, grown (31). cells sugars with lactose and when behavior glucose diauxie switch-like in catabolic results classic the that exhibits It site. binding o h ersosRlEadMr,teewr odt avail- data no were there MarR, and RelBE repressors the For reg- expected the showed qualitatively promoter each While h hr promoter, third The the consider we Next, ◦ 3) gi,tedfeetfaue nteepeso shift expression the in features different the Again, (39). C n ned h arcsagreed matrices the indeed, and S4 , Fig. Appendix, SI sascae ihmlil antibi- multiple with associated is mar, rel rmtrta rncie h toxin– the transcribes that promoter .coli E. relBE mar rmtri tefatvtdby activated itself is promoter or sgoni h rsneof presence the in grown is marR NSLts Articles Latest PNAS ee eedeleted. were genes n were and | f10 of 3 r =

BIOPHYSICS AND COMPUTATIONAL BIOLOGY A when considering other E. coli strains with LacI copy num- bers between about 10 and 1,000 copies per cell (SI Appendix, Fig. S5C). Additional characterization of the measurement sensitivity and dynamic range of this approach is noted in SI Appendix.

Sort-Seq Discovers Regulatory Architectures in Unannotated Regu- latory Regions. Given that more than one-half of the promoters in E. coli have no annotated transcription factor binding sites B in RegulonDB, we narrowed our focus by using several high- throughput studies to identify candidate genes to apply our approach (44, 45). The work by Schmidt et al. (45) in partic- ular measured the protein copy number of about one-half the E. coli genes across 22 distinct growth conditions. Using these data, we identified genes that had substantial differential gene expression patterns across growth conditions, thus hinting at the presence of regulation and even how that regulation is elicited by Fig. 3. Expression shifts reflect binding by regulatory proteins. (A) Expres- environmental conditions (additional details are in SI Appendix, sion shifts for the rel promoter but in a ∆relBE genetic background. Cells Fig. S2). On the basis of this survey, we chose to investigate the were grown in conditions identical to Fig. 2B but no longer show a sub- promoters of purT, xylE, and dgoRKADT. To apply Sort-Seq in stantial positive expression shift across the annotated RelBE binding site. (B) a more exploratory manner, we considered three 60-bp muta- Expression shifts for the mar promoter but in a ∆marR genetic background. genized windows spanning the intergenic region of each gene. The positive expression shift observed where MarR is expected to bind is no longer observed. Binding site annotations are identified in blue for RNAP While it is certainly possible that regulatory features will be out- sites, green for repressor sites, yellow for activator sites, and gray for ribo- side of this window, a search of known regulatory binding sites somal binding site and start codons. These annotations refer to the binding suggests that this should be sufficient to capture just over 70% of sites noted on RegulonDB that were observed in the Sort-Seq data. regulatory features in E. coli and provide a useful starting point (SI Appendix, Fig. S6). The purT promoter contains a simple repression architecture and Identification of Transcription Factors with DNA Affinity Chromatog- is repressed by PurR. The first of our candidate promoters is asso- raphy and Quantitative Mass Spectrometry. Next, it was important ciated with expression of purT, one of two genes found in E. coli to show that DNA affinity chromatography could be used to that catalyze the third step in de novo purine biosynthesis (46, identify transcription factors in E. coli. In particular, a challenge 47). Due to a relatively short intergenic region about 120 bp in arises in identifying transcription factors in most organisms due length that is shared with a neighboring gene yebG, we also per- to their very low abundance. In E. coli, the cumulative distri- formed Sort-Seq on the yebG promoter [oriented in the opposite bution in protein copy number shows that more than one-half have a copy number less than 100 per cell, with 90% having a copy number less than 1,000 per cell. This is several orders of magnitude below that of many other cellular proteins (40). AB We began by applying the approach to known binding sites for LacI and RelBE. For LacI, which is present in E. coli in about 10 copies per cell, we used the strongest binding site sequence, Oid (in vivo Kd ≈ 0.05 nM), and the weakest natural binding site sequence, O3 (in vivo Kd ≈ 110 nM) (31, 34, 41). In Fig. 4A, we plot the protein enrichments from each transcription fac- tor identified by mass spectrometry. LacI was found with both DNA targets, with fold enrichment greater than 10 in each case, and it was significantly higher than most of the proteins detected (indicated by the shaded region in Fig. 4A, which represents the 95% probability density region of all proteins detected, includ- ing non-DNA binding proteins). Purification of LacI with about 10 copies per cell using the weak O3 binding site sequence is near the limit of what would be necessary for most E. coli promoters. To ensure that this success was not specific to LacI, we also applied chromatography to the RelBE binding site. RelBE pro- Fig. 4. DNA affinity purification and identification of LacI and RelBE by vides an interesting case, since the strength of binding by RelB to mass spectrometry using known target binding sites. (A) Protein enrich- ment using the weak O3 binding site and strong synthetic Oid binding DNA is dependent on whether RelE is bound in complex to RelB sites of LacI. LacI was the most significantly enriched protein in each [with at least a 100-fold weaker dissociation constant reported in purification. The target DNA region was based on the boxed area of the absence of RelE (42, 43)]. As shown in Fig. 4B, we found the lac promoter schematic but with the native O1 binding site sequence over 100-fold enrichment of both proteins by mass spectrometry. replaced with either O3 or Oid. Data points represent average protein To provide some additional intuition into these results, we also enrichment for each detected transcription factor measured from a sin- considered the predictions from a statistical mechanical model gle purification experiment. (B) For purification using the RelBE binding of DNA binding affinity (SI Appendix). As a consequence of site target, both RelB and its cognate binding partner RelE were signifi- performing a second reference purification, we find that fold cantly enriched. Data points show the average protein enrichment from enrichment should mostly reflect the difference in binding energy two purification experiments. The target binding site is shown by the boxed region of the rel promoter schematic. Data points in each purifica- between the DNA sequences used in the two purifications and tion show the protein enrichment for detected transcription factors. The be much less dependent on whether the protein was in low or gray shaded regions show where 95% of all detected protein ratios were high abundance within the cell. This seemed to be the case found.

4 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1722055115 Belliveau et al. Downloaded by guest on October 2, 2021 Downloaded by guest on October 2, 2021 elva tal. et Belliveau the between region 120-bp approximately the the of for architecture shown regulatory the ers 5. Fig. oe,w aeol lte h ein hr euainwas regulation where we regions While the plotted pro- 5A. each only apparent. Fig. for have shown in than we region shown moter, larger asso- a are The on glucose. plots Sort-Seq performed 0.5% shift with media expression minimal ciated M9 in grown cells the 5 Fig. of in (schematic (48)] direction repression, simple of of and thermodynamic units repressor a in PurR to energies fit the yielding were for Data shown sites. are binding logos RNAP of sequence genes and the matrices between weight region Identical intergenic the (D) within detected. protein all (E represents of region region shaded density gray to the probability and 95% replicates, the three from values ment hypoxanthine of presence the media in minimal (10 performed M9 was in Binding grown factors glucose. cells transcription 0.5% from with produced for was values lysate Cell enrichment plotted. protein are and site, repressor fied shifts Expression (C (B) green. blue. (100 in adenine with identified mented are and matrix the for energy Cells The an gene. glucose. of coding 0.5% the ence native with of each media sites of minimal binding RNAP codon M9 start in grown the for posi- were to shown with promoter, relative are each noted for shifts observed tions Expression was regulation directions. where opposite regions 60-bp in code which genes, E D B A umr frgltr idn ie n rncito atr htbind that factors transcription and sites binding regulatory of Summary ) B /l.Errbr ersn h E acltduiglgpoenenrich- protein log using calculated SEM the represent bars Error µg/ml). u efre ihclscnann a containing cells with performed but purT purT otSqdsigihsdrcinlrgltr etrsaduncov- and features regulatory directional distinguishes Sort-Seq N fnt hoaorpywspromduigteidenti- the using performed was chromatography affinity DNA ) rmtrbti 9mnmlmdawt .%guoesupple- glucose 0.5% with media minimal M9 in but promoter and yebG purT /l.Apttv erso iei noae in annotated is site repressor putative A µg/ml). rmtr,w efre otSqwith Sort-Seq performed we promoters, k B T . rmtrwr eemndtruhinfer- through determined were promoter C purT .T ei u exploration our begin To A). ceai is schematic A (A) promoter. ∆purR yebG eei background. genetic and yebG 0and −10 Energy purT. and purT −35 natvtn eeepeso anttdi rnei Fig. in factor transcription orange multiple in of (annotated sites. possibility binding the expression involved suggesting be gene to 6A), seemed activating RNAP the of in to upstream able region entire were the We between the S8E). site to Fig. binding tive Appendix, RNAP by (SI the work (45)] [the locate media al. other promoter et in the Schmidt Interestingly, expression Sort-Seq source. no performing carbon essentially by this exhibited that in proceeded found grown and we cells xylose (45), in al. to et sensitive Schmidt from was data our From the xylose. of of analysis uptake in of involved expression mediated symporter with xylose/proton associated xylose a CRP. was of considered and we presence that XylR the moter of in binding induced is through operon energies xylE The binding 5E the Fig. absolute infer in Appendix). model in to resulting PurR the us show and enabled repres- RNAP This describe of to PurR. model by repression thermodynamic simple sion a developing a by of ysis appearance the for the (51) With architecture Sort-Seq. by fied oiggnsof genes coding location. consistent this is at binding which PurR 5D), with (Fig. disappeared site binding repressor a condition in growth adenine-rich in but the in shown per- more we once As validation, with Sort-Seq further PurR formed (50). As of sequence. PurR enrichment site binding substantial binds putative a this observed also now that we 5C, derivative Fig. therefore, purine which We, hypoxanthine, a of (47). presence is region the in intergenic purification this the repeated within bind might it aty ewr bet ne neeg arxt h putative the the to of S4 matrix that (r matched energy PurR that Impor- repressor, an specificity well-characterized infer bind sequence mechanism. with to to allosteric site able repressor ligand an were related through we that a tantly, possibly reasoned require we DNA, may adenine, the of factor presence transcription the the in when only grown observed fac- were repression transcription With cells S8 C). enriched were Fig. Appendix, substantially we (SI any however, tor attempt, identify putative initial to our this we unable In using factors, sequence. chromatography site transcription affinity binding associated DNA applied their next also, but sequences 5B). Fig. in annotation (green expression in shift between site repressor this putative 5B, the a Fig. revealed in features shown also two as condition adenine, these growth of While addition an on site. present of binding still RNAP were inference the an as of Through identified regions were plot. −35 features two shift these expression growth matrix, energy In the media. 5A, in growth regions (Fig. the in adenine adenine without purine the we of the (46), absence cell of the Sort-Seq within concentrations performed purine over control tight S8 Fig. Appendix, (SI (49)] and potent response LexA [a damage of C DNA proteolysis the mitomycin elicit of to known presence cross-linker the DNA in Sort-Seq repeating by that confirm and to search able also computational the The 9 were a We (48). shifted experimentally. on confirmed previously is based not identified site was binding was annotation RNAP what previous RNAP. the from and that downstream LexA find bp for did sites we binding However, containing work, prior with esmaietergltr etrsbtenthe between features regulatory the summarize we 5E, Fig. In olwn u taeyt n o nyteregulatory the only not find to strategy our Following of role the Given the For .W lontdCI-hpdt fPr htsgetthat suggest that PurR of data ChIP-chip noted also We ). 5and −35 yebG ∆ rmtrwsidcdi epnet N damage DNA to response in induced was promoter xylE yebG purR 0RA idn ie,idctdb positive a by indicated sites, binding RNAP −10 .I addition, In 6A). Fig. in blue in (annotated gene rmtr h etrswr agl consistent largely were features the promoter, purT tan nteasneo uR h putative the PurR, of absence the In strain. purT and purT ntesnhsso uie n the and purines of synthesis the in yebG purT ,w bevdtonegative two observed we Right), rmtr eetne u anal- our extended we promoter, nldn h etrsidenti- features the including , rmtri h rsneor presence the in promoter k h etuanttdpro- unannotated next The B adtoa eal r in are details (additional T Fig. , Appendix (SI 0.82) = NSLts Articles Latest PNAS 0and −80 nris(2,adwe and (52), energies A, and B, 0b rela- bp −40 0and −10 | D). f10 of 5 xylE, xylE SI

BIOPHYSICS AND COMPUTATIONAL BIOLOGY ACB

D

Fig. 6. Sort-Seq identifies a set of activator binding sites that drive expression of RNAP at the xylE promoter. (A) Expression shifts are shown for the xylE promoter, with Sort-Seq performed on cells grown in M9 minimal media with 0.5% xylose. The −10 and −35 regions of an RNAP binding site (blue) and a putative activator region (orange) are annotated. (B) DNA affinity chromatography was performed using the putative activator region, and protein enrichment values for transcription factors are plotted. Cell lysate was generated from cells grown in M9 minimal media with 0.5% xylose, and binding was performed in the presence of xylose supplemented at the same concentration as during growth. Error bars represent the SEM calculated using log protein enrichment values from three replicates. The gray shaded region represents the 95% probability density region of all proteins detected. (C) An energy matrix was inferred for the region upstream of the RNAP binding site. The associated sequence logo is shown above the matrix. Two binding sites for XylR were identified (SI Appendix, Figs. S4 and S8F) along with a CRP binding site. (D) Summary of regulatory features identified at xylE promoter, with the identification of an RNAP binding site and tandem binding sites for XylR and CRP.

We applied DNA affinity chromatography using a DNA tar- (56). We began by measuring expression from a nonmutagenized get containing this entire upstream region. Due to the stringent dgoRKADT promoter reporter in response to glucose, galactose, requirement for xylose to be present for any measurable expres- and D-galactonate. Cells grown in galactose exhibited higher sion, xylose was supplemented in the lysate during binding with expression than in glucose as found by Schmidt et al. (45), and the target DNA. In Fig. 6B, we plot the enrichment ratios from they exhibited even higher expression when cells were grown in this purification and find XylR to be most significantly enriched. D-galactonate (SI Appendix, Fig. S9A). This likely reflects the From an energy matrix inferred for the entire region upstream physiological role provided by the genes of this promoter, which of the RNAP site, we were able to identify two correlated seems necessary for metabolism of D-galactonate. We, therefore, 15-bp regions (dark yellow shaded regions in Fig. 6C) (Pearson proceeded by performing Sort-Seq with cells grown in either correlation r = 0.74 between energy matrices from each binding glucose or D-galactonate, since these appeared to represent dis- site). Mutations of the XylR protein have been found to diminish tinct regulatory states, with expression low in glucose and high in transport of xylose (53), which in light of our result, may be due D-galactonate. Expression shift plots from each growth condi- in part to a loss of activation and expression of this xylose/proton tions are shown in Fig. 7A. symporter. This is in addition to the loss of activation expected We begin by considering the results from growth in glucose by XylR of the high-affinity xylose uptake system XylFHG (53). (Fig. 7A, Upper). Here we identified an RNAP binding site These binding sites were also similar to those found on two other between −30 and −70 bp relative to the native start codon for promoters known to be regulated by XylR (xylA and xylF pro- dgoR (SI Appendix, Fig. S9B). Another distinct feature was a pos- moters), which also exhibit tandem XylR binding sites and strong itive expression shift in the region between −140 and −110 bp, binding energy predictions with our energy matrix (SI Appendix, suggesting the presence of a repressor binding site. Apply- Fig. S8F). ing DNA affinity chromatography using this target region, we Within the upstream activator region in Fig. 6A, there still observed an enrichment of DgoR (Fig. 7B), suggesting that the appeared to be a binding site unaccounted for upstream of promoter is indeed under repression and regulated by the first the tandem XylR binding sites. From the energy matrix, we coding gene of its transcript. As further validation of binding by were further able to identify a binding site for CRP, which is DgoR, the positive shift in expression was no longer observed noted in Fig. 6C. While we did not observe a significant enrich- when Sort-Seq was repeated in a ∆dgoR strain (Fig. 7D and ment of CRP in our protein purification, the most energetically SI Appendix, Fig. S9C). We also were able to identify addi- favorable sequence predicted by our model, TGCGACCNA- tional RNAP binding sites that were not apparent due to binding GATCACA, closely matches the CRP consensus sequence of by DgoR. While only one RNAP −10 motif is clearly visible TGTGANNNNNNTCACA. In contrast to the lac promoter, in the sequence logo shown in Fig. 7C (top sequence logo; binding by CRP here seems to depend more on the right half of TATAAT consensus sequence), we used simulations to show the binding site sequence. CRP is known to activate promoters that the entire sequence logo shown can be explained by the con- by multiple mechanisms (54), and CRP binding sites have been volution of three overlapping RNAP binding sites (SI Appendix, found adjacent to the activators XylR and AraC (53, 55), in line Fig. S9F). with our result. While additional work will be needed to char- Next, we consider the D-galactonate growth condition (Fig. acterize the specific regulatory mechanism here, it seems that 7A, Lower). Like in the expression shift plot for the ∆dgoR strain activation of RNAP is mediated by both CRP and XylR, and grown in glucose, we no longer observe the positive expression we summarize this result in Fig. 6D (considered further in SI shift between −140 and −110 bp. While there are still several Appendix). positions between −120 and −100 bp that are still positive, this The dgoRKADT promoter is autorepressed by DgoR with tran- can be attributed to a nonoptimal −10 binding site sequence for scription mediated by class II activation by CRP. As a final RNAP (wild type, TACATT) (Fig. 7C). The loss of the repres- illustration of the approach developed here, we considered the sive feature, therefore, suggests that DgoR may be induced by unannotated promoter of dgoRKADT. The operon codes for D-galactonate or a related metabolite. However, in comparison D-galactonate–catabolizing enzymes; D-galactonate is a sugar with the expression shifts in the ∆dgoR strain grown in glucose, acid that has been found as a product of galactose metabolism there were some notable differences in the region between −160

6 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1722055115 Belliveau et al. Downloaded by guest on October 2, 2021 Downloaded by guest on October 2, 2021 insit u omttn the mutating to due shifts sion 7. Fig. elva tal. et Belliveau 7E. Fig. in CRP be between to in estimate energy we interaction which the RNAP, data infer and these further Importantly, to by us site. dominated allowed binding now site is RNAP binding expression CRP-activated Appendix, because CRP the (SI likely putative CRP is by This the binding S9E). Fig. resembled at 500 clearly logo more of even sequence that presence a the in the from resulted in induction Growth strong provided (41)]. [derivative intra- CRP TK310 by control and activation to of for synthesis unable necessary levels thus cAMP cAMP is cellular for it needed and respectively, are degradation, phospho- which and (cpdA), (cyaA) cyclase diesterase adenlyate lacks JK10 Strain in cAMP. Sort-Seq repeated we as described. sequence defined previously this been is not although and has (54), specificity observed activation previously CRP-dependent been II class has we CRP where by with action overlaps directly a find site to binding expect would the of half the right to the (Fig. contrast TGTGA In sequence sequence. the identifies distribution, logo 7C, value sequence parameter inferred The the site. of percent RNAP, 95 and and with CRP associated between bounds energy lower interaction and The upper CRP. the and represent DgoR subscripts for respectively. and sites superscripts binding the and probability where sites 95% binding the RNAP multiple represents region shaded in gray grown in the cells detailed dgoRKADT and wild-type between (further replicates, on region glucose three formed the 0.5% from targeting with (C values in performed media detected. enrichment annotated was proteins protein site purification all log binding affinity of using DNA region calculated (B) density SEM respectively. the green, resent and the orange of in bp shown are sites D D D A glcoae(Lower -galactonate glcoaemyata nidcrfrDo.(E DgoR. for inducer an as act may -galactonate oioaeadbte dniyti uaieCPbnigsite, binding CRP putative this identify better and isolate To .W umrz h dnie euaoyfeatures regulatory identified the summarize We Appendix). SI Lower 4 p ee efideiec o nte R binding CRP another for evidence find we Here, bp. −140 The dgoRKADT rmtrwe efre na in performed when promoter ,wihmthstelf ieo h R consensus CRP the of side left the matches which ), dgoRKADT 0and (−10 sites binding RNAP as identified Regions ). rmtr h rncito atrDo a on oterce mn h rncito atr lte.Errbr rep- bars Error plotted. factors transcription the among enriched most found was DgoR factor transcription The promoter. rmtri nue ntepeec of presence the in induced is promoter 5RA idn ie hstp finter- of type This site. binding RNAP −35 utpeRA idn ie eeietfiduigSr-e aapromdi a in performed data Sort-Seq using identified were sites binding RNAP Multiple A. .coli E. dgoRKADT lac D tanJ1 rw n500 in grown JK10 strain and xrsinsit r hw o the for shown are shifts Expression (D) (54)]. activation II [class site binding CRP a identifying -galactonate, −7.3 dgoRKADT eunelgswr nerdfrtems ptem6-prgo soitdwt h ptemRNAP upstream the with associated region 60-bp upstream most the for inferred were logos Sequence ) xylE ∆dgoR rmtraesonfrclsgoni 9mnmlmdawt ihr05 lcs ( glucose 0.5% either with media minimal M9 in grown cells for shown are promoter .Blwti,asqec oowsas nerduigdt rmSr-e per- Sort-Seq from data using inferred also was logo sequence a this, Below S9). Fig. Appendix, SI k rmtr,however, promoters, B umr frgltr etrsietfida the at identified features regulatory of Summary ) T eei akrudgoni .%guoe hsrsmlsgot in growth resembles This glucose. 0.5% in grown background genetic frhrdetailed (further rmtrand promoter cAMP µM D glcoaedet oso ersinb gRadatvto yCP ( CRP. by activation and DgoR by repression of loss to due -galactonate µM BC etr 5) ihrpeso yPr.The PurR. by repression hypotheses. with regulatory (51), of tecture collection a the to For led annotation: analysis regulatory having our prior promoters Here, studied no then or we (34). binding little cell success, per O3 copies this 10 weak by about only Emboldened the in present particular, with is LacI In although identified site, sensitive. unambiguously highly was 43). were exper- LacI promoters 39, chromatography these affinity 34, on DNA 33, iments that 31, find (13, we knowledge Importantly, established with consistent ae rmtr of promoters tated regulatory condition-dependent growth regula- the architectures. of and specificity identity transcription into factor binding insight quantitative DNA further provide energy the factors tory spectrometry of describe mass inference that cognate and by matrices the modeling them enrich Information-based identify to analysis. used functional and then con- of factors are oligonucleotides locations transcription sites the DNA binding elucidate these sites. taining first binding to factor used Sort- transcription assay, is reporter (13), parallel massively Seq the A bacteria. dissecting in regulatory for sequences uncharacterized procedure previously of systematic mechanisms a functional established have We Discussion 5 r hw nbu,adpttv ciao n erso binding repressor and activator putative and blue, in shown are −35) E ovldt hsapoc,w xmndfu rvosyanno- previously four examined we approach, this validate To purT rmtr eietfidasml ersinarchi- repression simple a identified we promoter, lac, dgoRKADT rel, and mar, ∆ rmtr ihteietfiainof identification the with promoter, ε dgoR i a nerdt be to inferred was , yebG tangoni 9minimal M9 in grown strain NSLts Articles Latest PNAS D purT, n u eut were results our and , glcoae suggesting -galactonate, xylE 4 pand bp −145 and xylE, Upper rmtrwas promoter −7.3 r0.23% or ) Expres- A) | −1. +1.9 dgoR. f10 of 7 −110 4 k B T ,

BIOPHYSICS AND COMPUTATIONAL BIOLOGY found to undergo activation only when cells are grown in xylose, (54, 77) suggest that this approach will permit regulatory dissec- likely due to allosteric interaction between the activator XylR tion in any bacterium that supports efficient transformation by and xylose and activation by CRP (53, 55). Finally, in the case plasmids. Additionally, although we have focused on bacteria, of dgoR, the base pair resolution allowed us to tease apart over- our general strategy should be feasible for dissecting regula- lapping regulatory binding sites, identify multiple RNAP binding tion in a number of eukaryotic systems—including human cell sites along the length of the promoter, and infer further quanti- culture—using massively parallel reporter assays (14–16) and tative detail about the interaction between the identified binding DNA-mediated protein pull-down methods (21, 22) that have sites for CRP and RNAP. We view these results as a critical first already been established. step in the quantitative dissection of transcriptional regulation, which will ultimately be needed for a predictive understanding Materials and Methods of how such regulation works. SI Appendix has extended experimental details. While our results show the successful identification of reg- ulatory binding sites and regulatory mechanism at previously Bacterial Strains. All E. coli strains used in this work were derived from unannotated promoters, there also remain important challenges. K-12 MG1655, with deletion strains generated by the lambda red recom- The uncharacterized genes were selected based on genome-wide binase method (78). In the case of deletions for lysA (∆lysA::kan), purR studies (44, 45), and indeed, the hints of regulation in these (∆purR::kan), and xylE (∆xylE::kan), strains were obtained from the Coli Genetic Stock Center (Yale University) and transferred into a fresh MG1655 data were a necessary part of our strategy to systematically dis- strain using P1 transduction. The others were generated in house and sect each promoter. Datasets that quantitate protein abundance include the following deletion strains: ∆lacIZYA, ∆relBE::kan, ∆marR::kan, across a number of growth conditions, like those available in E. and ∆dgoR::kan. Details on strain construction are provided in SI coli (45) and yeast (57), or alternatively, transcript abundance Appendix. using RNA sequencing (RNA-Seq) will provide an important starting point for the dissection of regulatory mechanism in other Sort-Seq. Mutagenized single-stranded oligonucleotide pools were pur- bacteria. chased from Integrated DNA Technologies. Library oligonucleotides were An important aspect of the presented approach is that it can PCR amplified, inserted into the PCR-amplified plasmid backbone (i.e., vec- be applied to any promoter sequence, and there are a number tor) of pJK14 (SC101 origin) (13) by Gibson assembly, and electroporated of ways that throughput can be increased further. Microarray- into cells after drop dialysis in water. Cell libraries were then grown to saturation in LB and diluted 1:10,000 into the appropriate growth media synthesized promoter libraries and measurement of expression for the promoter under consideration, and grown to an optical density from barcoded transcripts using RNA-Seq instead of flow cytom- at 600 nm of 0.2–0.4. A Beckman Coulter MoFlo XDP cell sorter was used etry can be used to allow multiple loci to be studied simulta- to sort cells by fluorescence, with 500,000 cells collected into each of the neously (14, 18). Landing pad technologies for chromosomal four bins. Sorted cells were then regrown overnight in 10 mL of LB media integration (58–60) should enable massively parallel reporter under kanamycin selection. The plasmids in each bin were miniprepped assays to be performed in chromosomes instead of on plasmids. (Qiagen) after overnight growth, and PCR was used to amplify the mutated Techniques that combine these assays with transcription start site region from each plasmid for Illumina sequencing. SI Appendix has addi- readout (61) may provide additional resolution, further allow- tional details on library construction and Sort-Seq as well as on calculating ing the molecular regulators of overlapping RNAP binding sites expression shift plots and energy matrices. to be deconvolved or the contributions from separate RNAP binding sites, like those observed on the dgoR promoter, to be DNA Affinity Chromatography and Liquid Chromatography-MS/MS. SILAC la- beling (27, 28, 30) was implemented by growing cells (MG1655 ∆lysA) in better distinguished. As the number of regulatory regions under 13 15 either the stable isotopic form of lysine ( C6H14 N2O2) or natural form. SI study increases, it will also be important to develop additional Appendix has details on lysate preparation. analysis tools that provide automated identification of regulatory DNA affinity chromatography was performed by incubating cell lysate binding sites. (∼150 mg/mL protein) with magnetic beads (Dynabeads MyOne T1; To identify transcription factors across many target binding ThermoFisher) containing tethered DNA (streptavidin-biotin linkage). sites, DNA affinity chromatography samples can be further mut- ssDNA was purchased from Integrated DNA Technologies with the biotin liplexed using isobaric labeling strategies (62, 63). Continued modification on the 50 end of the oligonucleotide sense strand. Cell lysates were incubated on a rotating wheel with the DNA tethered beads overnight performance improvements in mass spectrometer sensitivity and ◦ sample processing (64–66) will also make this assay less onerous at 4 C. Elution was achieved by cleaving the DNA with the restriction to apply across many targets and different binding conditions. enzyme PstI, and samples were then prepared for mass spectrometry by in- gel digestion with endoproteinase Lys-C. Liquid chromatography tandem This will be especially important for situations where the data mass spectrometry experiments were carried out as previously described suggest that a small molecule effector might be acting to mod- (79), and they are further detailed in SI Appendix. Thermo RAW files were ulate binding of the transcription factor to its target sequence, processed using MaxQuant (v. 1.5.3.30) (80). requiring multiple binding conditions to be tested. Perform- ing reporter assays in transcription factor deletion strains will Code Availability and Data Analysis. All code used for processing data and continue to play an important role in promoter dissection as plotting as well as the final processed data, plasmid sequences, and primer we have shown for a variety of the promoters, and it will sequences can be found on our GitHub repository (https://www.github.com/ provide a secondary means with which to identify and vali- RPGroup-PBoC/sortseq belliveau; DOI: 10.5281/zenodo.1184169). Thermo date binding sites. Genome-wide gene deletion libraries are RAW files for mass spectrometry are available at the jPOSTrepo repository now available for a wide variety of bacteria (67–72), and it is (81) (accession no. PXD007892). Sort-Seq sequencing files are available at the Sequence Read Archive (accession no. SRP121362). now possible to perform genetic perturbations using CRISPR interference (73, 74) that should open up the possibility of ACKNOWLEDGMENTS. We thank David Tirrell, Bradley Silverman, and Seth applying such perturbation strategies more easily in less studied Lieblich for access to their Beckman Coulter MoFlo XDP cell sorter. Jost organisms. Vielmetter and Nina Budaeva provided access to their Cell Disruptor. We also thank Hernan Garcia, Manuel Razo-Mejia, Griffin Chure, Suzan- Although our work was directed toward regulatory regions nah Beeler, Heun Jin Lee, Justin Bois, and Soichi Hirokawa for useful of E. coli, there are no intrinsic limitations that restrict the discussion. This work was supported by La Fondation Pierre-Gilles de analysis to this organism. Rather, most bacteria contain small Gennes; the Rosen Center at Caltech; NIH Grants DP1 OD000217 (Direc- intergenic regions several hundred base pairs in length that make tor’s Pioneer Award), R01 GM085286, 1R35 GM118043-01 (Maximizing Investigators’ Research Award), and 1S10RR029594-01A1; the Gordon and this approach especially suitable. The sequence specificity of Betty Moore Foundation through Grant GBMF227; and the Beckman Insti- most characterized prokaryotic transcription factors (75, 76) and tute. N.M.B. was supported by an HHMI International Student Research the sigma factors that allow RNAP to recognize each promoter Fellowship.

8 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1722055115 Belliveau et al. Downloaded by guest on October 2, 2021 Downloaded by guest on October 2, 2021 4 acaH,Pilp 21)Qatttv iscino h iperpeso input- repression simple the multiple of dissection mediated Quantitative chromosomally (2011) R of Phillips Regulation HG, Garcia (1997) 34. SB Levy MN, stress Alekshun toxin–Antitoxin Prokaryotic 33. (2005) A Løbner-Oleson SK, Christensen K, Gerdes 32. Kr ER, Eismann S, Oehler 31. cell bacterial to applied acids amino by labeling isotope Stable (2014) B Macek B, Soufi 30. parallel massively genome-integrated A (2017) BA Cohen JD, Dougherty BB, Maricque high-throughput transcription controlling 19. sequences from regulatory of logic Composability (2013) al. regulatory et S, Kosuri gene 18. Inferring (2012) al. et mammalian E, of dissection Sharon functional parallel 17. Massively (2012) al. pre- et RP, 2000 Patwardhan in motifs 16. regulatory of inducible dissection Systematic of (2013) optimization al. et and P, Kheradpour dissection 15. Systematic (2012) al. et charac- A, to sequencing Melnikov deep 14. Using (2010) EC Cox CG, Callan A, Murugan at JB, repression Kinney and activation of 13. mechanisms of Analysis (2009) SJW Busby SD, Minchin 12. proteins. DNA-binding bacterial of analysis genomic-scale initiation for ChIP-seq transcription (2015) intragenic JT Wade of suppression 11. Widespread (2014) al. et SS, Singh 10. 9 aln ,e l 21)Da ucin sacnrlhbi the in hub central a as functions DnaK (2012) al. protein et chaperonin-dependent G, of Calloni analysis Proteome-wide 29. SILAC, (2005) culture, al. cell et in MJ, acids Kerner amino 28. by labeling isotope Stable (2002) consensus al. display et to SE, way Ong new A 27. logos: Sequence (1990) RM Stephens sequence-function TD, of Schneider modeling 26. Quantitative MPAthic: (2016) JB Kinney WT, Ireland 25. quan- A fitness: Energy-dependent (2008) M Lassig CG, Callan J, Kinney V, Mustonen in units 24. transcriptional of regulation tight and Independent (1997) H Bujard R, Lutz 23. interac- factor-DNA transcription of measurement Systematic (2013) al. screen et interaction H, protein Mirzaei enhancer–Promoter DNA 22. SILAC-based functional A (2009) of M mapping Mann F, Systematic Butter G, (2016) Mittler al. 21. et CP, Fulco 20. elva tal. et Belliveau .ZegD osatnduC omnJ,MnhnS 20)Ietfiaino h CRP the of Identification (2004) SD Minchin JL, Hobman C, Constantinidou bacterial D, Zheng a 9. of landscape (2015) occupancy JT Wade Protein RP, (2009) Bonocora S 8. Tavazoie AK, Hottes dis- the T, of Vora Studies (2005) 7. SJW Busby J, Holdstock M, Harrison D, Hurd DC, Grainger exper- 6. of database A CollecTF: sequences (2013) I Erill regulatory JP, Cornish of DM, Sagitova ER, database White Kilic¸ S, RegTransBase–A 5. (2013) al. et MJ, systems Cipriano with 4. databases organism model Fusing M EcoCyc: 3. (2013) al. et IM, Keseler 2. .Gm-atoS ta.(06 euoD eso .:Hg-ee nerto fgene of integration High-level 9.0: version RegulonDB (2016) al. et S, Gama-Castro 1. uptfunction. output The resistance: antibiotic loci. response repression. in cooperate operon 9–22. pp York), New (Humana, B culture. network. neural in activity cis-regulatory of cells. determinants sequence DNA reveals assay reporter in translation and promoters. designed systematically 30:521–530. of thousands of measurements vivo. assay. in enhancers reporter parallel massively a using 23:800–811. enhancers human dicted assay. reporter parallel massively a using 30:271–277. cells human in enhancers sequence. regulatory USA transcriptional Sci a Acad of mechanism biophysical the terize promoters. bacterial 119–134. pp 883, Vol York), Biology Systems Prokaryotic H-NS. by profiling. transcriptional vivo 5893. in and vitro in using 327–340. pp York), New (Humana, Proteins. Binding genome. coli of tribution bacteria. in sites 42:D156–D160. factor-binding transcription transcriptional validated investigating imentally for resource A literature: prokaryotes. in on regulation based interactions and Res Acids biology. D143. odn nEceihacoli. Escherichia in folding proteomics. expression to approach accurate 1:376–386. and simple a as sequences. bioRxiv:054676. assays. parallel massively for relationships sites. binding factor transcription yeast USA Sci of Acad evolution the for model titative Res Acids coli Escherichia proteins. regulatory gene USA candidate Sci Acad identifies Natl spectrometry Proc mass targeted by tions elements. DNA functional to 19:284–293. proteins binding candidate identifies that interference. CRISPR with connections beyond. and clustering motif coexpression, regulation, nhR ta.(03 RDRC rkroi aaaeo eeregulation. gene of database Prokaryotic PRODORIC: (2003) al. et R, unch ¨ chromosome. uli cd Res Acids Nucleic dWarscheid ed (SILAC), Culture Cell in Acids Amino by Labeling Isotope Stable uli cd Res Acids Nucleic ee Dev Genes o Cell Mol elRep Cell 31:266–269. 25:1203–1210. uli cd Res Acids Nucleic shrci coli Escherichia 107:9158–9163. 105:12376–12381. a e Microbiol Rev Nat i h aRO h eROadAa/1I euaoyelements. regulatory AraC/I1-I2 and TetR/O the LacR/O, the via rcNt cdSiUSA Sci Acad Natl Proc rcNt cdSiUSA Sci Acad Natl Proc 35:247–253. 1:251–264. shrci coli Escherichia a Biotechnol Nat 28:214–219. Methods 45:e16–e16. 110:3645–3650. mrH M H, amer ¨ mar 41:D605–D612. Cell M Genomics BMC APrcpo rti n N oyeaeaogthe along polymerase RNA and protein cAMP-receptor d rsmvthI atneoT Hmn rs,New Press, (Humana TJ Santangelo I, Artsimovitch eds , 18:6097–6100. regulon. hPSqfrGnm-cl nlsso atra DNA- Bacterial of Analysis Genome-Scale for ChIP-Seq 47:6–12. 122:209–220. MOJ EMBO 2:371–382. . 30:265–270. rcNt cdSiUSA Sci Acad Natl Proc le-ilB(90 h he prtr ftelac the of operators three The (1990) B uller-Hill ¨ Science o Biol Mol J 108:12173–12178. 9:973–979. 102:17693–17698. 14:213–221. 354:769–773. 41:2067–2075. uli cd Res Acids Nucleic 110:14024–14029. uli cd Res Acids Nucleic o elProteomics Cell Mol .coli E. uli cd Res Acids Nucleic a Biotechnol Nat a Biotechnol Nat eoeRes Genome eoeRes Genome chaperone 44:D133– rcNatl Proc rcNatl Proc 32:5874– Nucleic Nucleic E. 9 eBrrii ,e l 20)Acmlt olcino igegn eeinmutants deletion single-gene of collection complete A (2008) al. et V, Berardinis de libraries deletion genome-scale 69. two of analysis and Construction (2017) al. of et BM, Construction Koo (2006) 68. al. et T, Baba 67. proteomics quantitative A (2014) HG Stunnenberg NC, Hornig LN, structure Nguyen NC, proteome Hubner of 66. exploration Mass-spectrometric (2016) M Mann R, targeted Aebersold with multiplexing 65. sample combine to strategy A (2017) cerevisiae al. Saccharomyces et in BK, strategy quantitation Erickson protein quantification 64. Multiplexed novel (2004) al. A et tags: PL, Ross mass 63. Tandem (2003) al. et A Thompson 62. “MASTER”: readout, end transcript systematic Massively (2015) al. et IO, dissec- Vvedenskaya Systematic 2018) 61. 1, (February S Kosuri H, Kim K, Insigne AD, Tripp G, Urtecho 60. the of mutagenesis Comprehensive (2016) SL Chen Y, Wan synthetic TT, Susanto large H, of Zhang integration 59. chromosomal Site-specific (2010) datasets EC abundance Cox protein TE, of Kuhlman Unification 58. (2018) GW Brown A, Baryshnikova B, Ho 57. in regulon PurR The beyond. (2011) and al. et yeast BK, In Cho biosynthesis: nucleotide 47. purine of Regulation (2006) condition-dependent RJ and Rolfes quantitative 46. The (2016) inference. al. network et gene A, robust Schmidt for crowds 45. of Wisdom (2012) al. et interferase D, Marbach RNA Messenger 44. (2008) K Gerdes MG, Jørgensen J, transcriptional Borch of M, mechanism Overgaard Structural 43. (2008) M Ikura M, Inouye Y, Zhang GY, of control Li transcriptional Combinatorial 42. protein (2007) T Hwa absolute MH, Saier Quantifying Z, Zhang (2014) T, Kuhlman JS 41. Weissman C, Gross multiple D, the of Burkhardt repressor GW, the MarR, Li of 40. Characterization (1995) SB Levy AS, Seoane of 39. activation transcriptional for factor accessorial an Fis, (1997) JL Rosner RG, Martin 38. of RelE and RelB (2009) K Gerdes J, Borch M, Overgaard ribonu- 37. efficient highly A RelE: toxin Bacterial (2013) K Gerdes J, Borch M, Overgaard 36. 6 oprR 17)Teuiiaino -aatnt n D-2-oxo-3-deoxygalactonate and D-galactonate of utilisation The tran- (1978) the RA of Cooper analysis 56. Computational (2001) MS Gelfand AA, Mironov ON, Laikova initiation transcription of 55. regulation global and Local (2016) SJW Busby DF, Browning 54. the of binding corepressor and characterization Structural bind- (1992) LexA H of Zalkin KY, analysis Choi Genomic (2005) 50. K Struhl GM, Church NB, Reppas of JT, Identification (1997) Wade DF 49. Almeida ABF, Pacheco AT, Vasconcelos MR, Lomba 48. bacterial underlying mechanisms Molecular (2014) K Gerdes E, Maisonneuve 35. 3 ogS akC(97 raiainadrglto fteDxls D-xylose the of regulation and Organization (1997) C Park relationships S, sequence-function quantitative Song Learning 53. (2016) JB Models. Kinney numbers: GS, the Atwal by regulation 52. Transcriptional (2005) al. et L, Bintu 51. of subtilis. Bacillus for collection. keio The mutants: knockout blood. or cells primary in interactions 14:1315–1329. DNA–protein identify to tool function. and characterization. signature protein 65:361–370. high-throughput for assays proteomics reagents. tagging isobaric spectrometry/mass amine-reactive using mass by mixtures protein complex of spectrometry. analysis comparative for yields. transcript and slippage, transcriptional Cell selection, site start Transcription coli. E. in assay reporter controlling multiplexed elements sequence of tion uropathogenic in pili 1 type of regulation coli Escherichia novel reveals switch regulatory promoter constructs. proteome. cerevisiae Saccharomyces quantitative a yields Trans Soc Biochem proteome. Methods 857. controls RelE 380:107–119. the of autorepression coli. Escherichia of operon resources. lactose the cellular of allocation underlying principles 157:624–635. reveals rates synthesis in operon (mar) resistance antibiotic Rob. or SoxS, MarA, activator the of the RelB. in motif ribbon–helix–helix the via Biol transcription represses that complex residues. catalytic atypical using specificity 52:8633–8642. substrate exquisite with clease citoa euaino ets tlzto ytm ntegmasbiiinof subdivision by gamma the in systems utilization pentose Proteobacteria. of regulation scriptional bacteria. in 179:7025–7032. coli Escherichia the of nature sites. permissive target unconventional the reveals ing damage-inducible DNA a as Res persisters. in experiments. parallel massively from Dev Genet cntbce baylyi Acinetobacter shrci coli Escherichia shrci coli Escherichia 39:6456–6464. mar 60:953–965. 394:183–196. mlil niitcrssac)pooe of promoter resistance) antibiotic (multiple 9:796–804. Cell a Biotechnol Nat 15:116–124. uli cd Res Acids Nucleic a e Microbiol Rev Nat relBE Nature nlChem Anal 157:539–548. . uierepressor. purine ESMcoilEcol Microbiol FEMS rcNt cdSiUSA Sci Acad Natl Proc -2 iceia n eeia studies. genetical and Biochemical K-12. 34:786–790. rncito ycniinlcooperativity. conditional by transcription elSys Cell -2 yRat satasrpinlactivator. transcriptional a as acts XylR K-12: 537:347–355. shrci coli Escherichia ADP1. 75:1895–1904. 34:104–111. 4:291–305.e7. shrci coli Escherichia ee Dev Genes 38:e92. 14:638–650. o ytBiol Syst Mol Bacteriol J Bacteriol J Biochemistry 205:315–322. shrci coli Escherichia ttPhys Stat J rcNt cdSiUSA Sci Acad Natl Proc eBRl nioi/oi module. antitoxin/toxin RelB/RelE σ o ytBiol Syst Mol 113:4182–4187. 19:2619–2630. shrci coli Escherichia 0pooesuigagenomically-encoded a using promoters 70 shrci coli Escherichia gene. 4:174–154. 174:6207–6214. shrci coli Escherichia 179:7410–7419. o elProteomics Cell Mol 162:1203–1243. 10.1021/acs.biochem.7b01069. , ESMcoilEcol Microbiol FEMS NSLts Articles Latest PNAS . 2:2006.0008. Bacteriol J shrci coli Escherichia shrci coli Escherichia -2MG1655. K-12 -2i-rm,single-gene in-frame, k-12 elSys Cell rhMicrobiol Arch eoeadidentifies and genome 104:6043–6048. o Microbiol Mol 177:3414–3419. 6:1–14. 3:1154–1169. rtoeRes Proteome J ntepresence the in shrci coli Escherichia 156:119–122. uli Acids Nucleic omatight a form Bacteriol J 1:199–206. | o Biol Mol J urOpin Curr Biochem o Cell Mol 69:841– f10 of 9 Mol J yebG fimS Mol Cell Nat

BIOPHYSICS AND COMPUTATIONAL BIOLOGY 70. Porwollik S, et al. (2014) Defined single-gene and multi-gene deletion mutant 76. Stewart AJ, Plotkin JB (2012) Why transcription factor binding sites are ten collections in Salmonella enterica sv Typhimurium. PLoS One 9:e99820. nucleotides long. Genetics 192:973–985. 71. Xu P, et al. (2011) Genome-wide essential gene identification in Streptococcus 77. Fekl´ıstov A, Sharon BD, Darst SA, Gross CA (2014) Bacterial sigma factors: A historical, Sanguinis. Sci Rep 1:125. structural, and genomic perspective. Annu Rev Microbiol 68:357–376. 72. Liberati NT, et al. (2006) An ordered, nonredundant library of Pseudomonas 78. Datsenko KA, Wanner BL (2000) One-step inactivation of chromosomal genes aeruginosa strain PA14 transposon insertion mutants. Proc Natl Acad Sci USA in K-12 using PCR products. Proc Natl Acad Sci USA 97:6640– 103:2833–2838. 6645. 73. Larson MH, et al. (2013) CRISPR interference (CRISPRi) for sequence-specific control of 79. Kalli A, Hess S (2011) Effect of mass spectrometric parameters on peptide and pro- gene expression. Nat Protoc 8:2180–2196. tein identification rates for shotgun proteomic experiments on an LTQ-orbitrap mass 74. Gordon GC, et al. (2016) CRISPR interference as a titratable, trans-acting regulatory analyzer. Proteomics 12:21–31. tool for Metab Eng in the cyanobacterium Synechococcus sp. strain PCC 7002. Metab 80. Cox J, et al. (2009) A practical guide to the MaxQuant computational platform for Eng 38:170–179. SILAC-based quantitative proteomics. Nat Protoc 4:698–705. 75. Lassig¨ M (2007) From biophysics to evolutionary genetics: Statistical aspects of gene 81. Okuda S, et al. (2017) jPOSTrepo: An international standard data repository for regulation. BMC Bioinformatics 8(Suppl 6):S7. proteomes. Nucleic Acids Res 45:D1107–D1111.

10 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1722055115 Belliveau et al. Downloaded by guest on October 2, 2021