A Database of Regulatory Networks in Gamma-Proteobacterial Genomes Abel D

Total Page:16

File Type:pdf, Size:1020Kb

A Database of Regulatory Networks in Gamma-Proteobacterial Genomes Abel D View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Open Repository and Bibliography - Luxembourg D98–D102 Nucleic Acids Research, 2005, Vol. 33, Database issue doi:10.1093/nar/gki054 TRACTOR_DB: a database of regulatory networks in gamma-proteobacterial genomes Abel D. Gonza´lez, Vladimir Espinosa, Ana T. Vasconcelos1, Ernesto Pe´rez-Rueda2 and Julio Collado-Vides3,* National Bioinformatics Center, Industria y San Jose´, Capitolio Nacional, CP. 10200, Habana Vieja, Habana, Cuba, 1National Laboratory for Scientific Computing, Avenue Getulio Vargas 333, Quitandinha, CEP 25651-075, Petropolis, Rio de Janeiro, Brazil, 2Depto. de Ingenierı`a Celular y Biocata´lisis, IBT-UNAM, Cuernavaca, Morelos, Mexico and 3Center of Genomics, UNAM, AP 565-A Cuernavaca, CP. 62100, Morelos, Mexico Received August 6, 2004; Revised and Accepted October 1, 2004 Downloaded from ABSTRACT out in this direction (1–3). The first step towards this goal is the recognition of all the genes regulated by a transcription factor Experimental data on the Escherichia coli (TF), i.e. its regulon. transcriptional regulatory system has been used in Computational approaches to recognizing the location of http://nar.oxfordjournals.org/ the past years to predict new regulatory elements regulatory sites in bacterial genomes include the use of weight (promoters, transcription factors (TFs), TFs’ binding matrices (4), phylogenetic footprinting (5), searching for sites and operons) within its genome. As more gen- statistical overrepresentation of oligonucleotides within a omes of gamma-proteobacteria are being sequenced, genome and clustering co-expressed genes in order to find the prediction of these elements in a growing number conserved patterns in their upstream regions (6,7), among of organisms has become more feasible, as a others (8,9). Recently, Tan et al. (10) proposed a new method- step towards the study of how different bacteria ology which brings together the advantages of both, the weight respond to environmental changes at the level of matrices and the phylogenetic footprinting approaches. at University of Luxembourg on May 12, 2014 In the past few years, a great amount of research has been transcriptional regulation. In this work, we present dedicated to computational prediction of important regulatory TRACTOR_DB (TRAnscription FaCTORs’ predicted elements in the Escherichia coli genome: promoters (11), binding sites in prokaryotic genomes), a relational data- operons (12), TFs (13) and TF binding sites (9). As more base that contains computational predictions of new bacterial genomes are sequenced, it is becoming more impor- members of 74 regulons in 17 gamma-proteobacterial tant to extend these efforts to other organisms, and decipher genomes. For these predictions we used a compar- their transcriptional regulatory networks by means of com- ative genomics approach regarding which several parative regulatory studies (10,14–19). proof-of-principle articles for large regulons have Our two major goals in this work were the production of a been published. TRACTOR_DB may be currently reliable set of binding site predictions for as many gamma- accessed at http://www.bioinfo.cu/Tractor_DB, http:// proteobacterial TFs as possible in 17 organisms of this division www.tractor.lncc.br/ or at http://www.cifn.unam.mx/ [E.coli K12 (NC_000913), Haemophilus influenzae (NC_000907), Salmonella typhi (NC_003198), Salmonella Computational_Genomics/tractorDB. Contact Email typhimurium LT2 (NC_003197), Shewanella oneidensis id is [email protected]. (NC_004347), Shigella flexneri 2a (NC_004337), Vibrio cholerae (NC_002505), Yersinia pestis KIM (NC_004088), Buchnera aphidicola (NC_004545), Pseudomonas aeruginosa (NC_002516), Pseudomonas syringae (NC_004578), Pasteur- INTRODUCTION ella multocida (NC_002663), Pseudomonas putida KT2440 One of the challenges of Functional Genomics is the identifi- (NC_002947), Vibrio parahaemolyticus (NC_004603), Vibrio cation of all the elements that take part in an organism’s vulnificus CMCP6 (NC_004459), Xanthomonas axonopodis transcriptional regulatory network. This is necessary to under- (NC_003919) and Xylella fastidiosa (NC_002488)], and stand how the cell reacts to environmental stimuli at the level the construction of a database (TRACTOR_DB, accessible of transcriptional regulation. Intense research is being carried at http://www.bioinfo.cu/Tractor_DB, http://www.tractor. *To whom correspondence should be addressed. Tel: +527 773 132063; Fax: +527 773 175581; Email: [email protected] The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use permissions, please contact [email protected]. ª 2005, the authors Nucleic Acids Research, Vol. 33, Database issue ª Oxford University Press 2005; all rights reserved Nucleic Acids Research, 2005, Vol. 33, Database issue D99 lncc.br and http://www.cifn.unam.mx/Computational_ four putative sites. These sets were then used to build a specific Genomics/tractorDB) with a user-friendly navigation interface model for each organism using CONSENSUS, which were containing these computationally predicted sites. Several then used to re-calculate the cutoffs and to re-scan the reg- recent papers have addressed the first goal; however, to our ulatory regions of each organism. knowledge, the present work is the most complete in terms of Summing up, the main features included in the approach the number of regulons and organisms that it comprises. Most proposed by Tan et al. (10) in order to extend it to as many TFs other studies have been limited to E.coli and H.influenzae (10) as possible were as follows: (i) the use of two different algo- or to one or few regulons (18,19). In contrast, the work by rithms to build the statistical models of each TF binding site: McCue et al. (20) predicted a large number of putative reg- the CONSENSUS (4) and the Gibbs-SAMPLER (22); (ii) the ulatory sites in 10 gamma-proteobacterial genomes, but could inclusion, within the training set used to build the model of not associate many of those sites to a given TF. each TF, of non-coding regions upstream the TUs of the organisms other than E.coli that are orthologous to those that in E.coli are known as regulon members (orthologous non-coding regions), along with E.coli known binding METHODS sequences of the TF; and (iii) the reconstruction of the models Downloaded from Selecting organisms and regulons for each TF in each organism after the prediction process, which allows the refinement of the search for new members We used the main ideas of the methodology proposed by Tan of the regulon within each genome. et al. (10) to predict new regulon members in 17 gamma- All the sites found for a given TF—in the eight initial proteobacterial genomes. Orthology information is used in organisms—after the second scanning using the rebuilt models this methodology to assess the biological significance of were aligned, using CONSENSUS, to produce a Positional http://nar.oxfordjournals.org/ putative regulon members. The methodology requires that Weight Matrix (PWM). Those PWMs were then used to the organisms selected be phylogenetically close to E.coli, scan the genomes of the other nine organisms for new putative since E.coli known binding sequences are used to build a binding sites. The orthology filtering process was done as statistical model of a TF’s binding site, and this model is described in the Supplementary Material, Section I.5, using subsequently used to predict new members of that regulon only the first eight organisms at the centre of the analysis. in the other organisms within the study. Hence, we included For a thorough description of the predictive methodo- in our study a group of organisms from different subdivisions logy see Section I.(1–6) and Figure 1 of the Supplementary of the gamma-proteobacteria subclass whose genomes were Material. completely sequenced. We started working with all TFs with at University of Luxembourg on May 12, 2014 at least one binding site known in E.coli. Designing and building the database Outline of the predictive methodology The database was designed and built following the relational model, and installed on a MySQL server. The web interface is Original training sets contain E.coli TFs’ binding sequences managed by a cgi PERL script that does all the work, from extracted from RegulonDB, version 4.0 (21). The first eight querying the database at the user’s request, to generating the organism referenced in the list above (those sharing at least dynamic web pages that form the interface. The design that we 30% of orthologous genes with E.coli) were used to construct have adopted makes it very easy to incorporate new instances the original training sets. A training set was built for each TF to the database, which may be very easily accommodated into with at least one known binding sequence in E.coli (21) and it the interface. Figure 1 shows the main relationships between included also those of orthologous non-coding regions when the tables that compose TRACTOR_DB and the most impor- less than 25 binding sequences were known in E.coli. We built tant queries carried out by the interface program. weight matrices only for TFs with training sets larger than four Several links in the dynamically generated web pages that sequences. Two statistical models were built for each TF: form the interface make the navigation easy and user-friendly. one using CONSENSUS, and the second using the Gibbs- The dynamic pages are linked to other databases such SAMPLER. The training sets were filtered twice as proposed as RegulonDB (21), EcoCyc (23) and the NCBI database.
Recommended publications
  • Escherichia Coli: Protein Families and Binding Sites M
    Functional determinants of transcription factors in Escherichia coli: protein families and binding sites M. Madan Babu and Sarah A. Teichmann DNA-binding transcription factors regulate the expression A reasonable hypothesis would be to assume that of genes near to where they bind. These factors can be activators, repressors and dual regulators are more activators or repressors of transcription, or both. Thus, a closely related to the other proteins within each fundamental question is what determines whether a regulatory group than to proteins in other groups. This transcription factor acts as an activator or repressor? would mean that there would be protein families and Previous research into this question found that a protein's domain architectures that were characteristic either of regulatory function is determined by one or more of the activators, repressors or dual regulators. Prag et al. [10] following factors: protein–proteinprotein–protein contacts, position of the and Perez-Rueda et al. [11] reported evidence to support DNA-binding domain in the protein primary sequence, this by relating the position of helix-turn-helix motifs altered DNA structure, and the position of its binding site along the primary sequence to a protein's regulatory on the DNA relative to the transcription start site. Although function. They used sequence comparison and profile there are many aspects specific to different transcription methods to locate the helix-turn-helix motifs that factors, in this work we demonstrate that, in general, in the represent DBDs, and found that the helix-turn-helix prokaryote Escherichia coli, a transcription factor's protein tends to be at the N terminus for repressors and at the C family is not indicative of its regulatory function, but the terminus for activators.
    [Show full text]
  • Regulondb (Version 6.0): Gene Regulation Model Of
    D120–D124 Nucleic Acids Research, 2008, Vol. 36, Database issue doi:10.1093/nar/gkm994 RegulonDB (version 6.0): gene regulation model of Escherichia coli K-12 beyond transcription, active (experimental) annotated promoters and Textpresso navigation Socorro Gama-Castro1, Vero´ nica Jime´ nez-Jacinto1, Martı´n Peralta-Gil1, Alberto Santos-Zavaleta1,Mo´ nica I. Pen˜ aloza-Spinola1, Bruno Contreras-Moreira1, Juan Segura-Salazar1, Luis Mun˜ iz-Rascado1, Irma Martı´nez-Flores1, Heladia Salgado1,Ce´ sar Bonavides-Martı´nez1, Cei Abreu-Goodger1, Carlos Rodrı´guez-Penagos1, Juan Miranda-Rı´os2, Enrique Morett2, Enrique Merino2, Araceli M. Huerta1, Luis Trevin˜ o-Quintanilla1 and Julio Collado-Vides1,* Downloaded from 1Program of Computational Genomics, Centro de Ciencias Geno´ micas, Universidad Nacional Auto´ noma de Me´ xico, A.P. 565-A, Cuernavaca, Morelos 62100, Mexico and 2Instituto de Biotecnologı´a, Universidad Nacional Auto´ noma de Me´ xico, A.P. 510-3, Cuernavaca, Morelos 62100, Mexico http://nar.oxfordjournals.org/ Received September 14, 2007; Revised October 19, 2007; Accepted October 22, 2007 ABSTRACT with more than 4000 curation notes, can now be RegulonDB (http://regulondb.ccg.unam.mx/) is the searched with the Textpresso text mining engine. primary reference database offering curated knowl- edge of the transcriptional regulatory network at Carleton University on June 21, 2015 of Escherichia coli K12, currently the best-known INTRODUCTION electronically encoded database of the genetic A major current task for bioinformatics is that of repre- regulatory network of any free-living organism. senting biological information into an electronic and This paper summarizes the improvements, new computable form. These computational representations biology and new features available in version 6.0.
    [Show full text]
  • Description Data File Programa De Genómica Computacional / Centro De Ciencias Genómicas
    Description Data File Programa de Genómica Computacional / Centro de Ciencias Genómicas DESCRIPTION DATA FILE 1. GENERAL INFORMATION. Title: LeuO and H-NS Data Set Reference: Shimada T, Bridier A, Briandet R, Ishihama A. Novel roles of LeuO in transcription regulation of E. coli genome: antagonistic interplay with the universal silencer H-NS. Mol Microbiol. 2011 Oct;82(2):378-97. PMID: 21883529 Contact person for this data set: Questions concerning the content of the data set that are raised by users of RegulonDB will be forwarded to this person. We would appreciate receiving a copy of the response to the user, so we can keep track of taking care of user requests. Person: RegulonDB staff Email address: [email protected] 2. DATA SET DESCRIPTION. Summary: LeuO Data Set. LeuO, the regulator of the leucine biosynthesis operon of Escherichia coli, is involved in the regulation of as-yet-unspecified genes that affect the stress response and pathogenesis expression. LeuO is involved in an antagonistic interplay with the universal silencer H-NS. Genomic SELEX screening was performed to identify the whole set of LeuO and H-NS regulation targets. A total of 140 LeuO-binding peaks (183 regulatory interactions) were identified by SELEX methodology, of which as many as 133 (95%) were found to contain the binding site of H-NS. In addition, a DNA microarray assay was carried out to identify genes affected by deletion or overexpression of the leuO gene. A total of 35 regulatory interactions were found via SELEX as well as microarray analysis. Based on the behavior of the regulated genes in the microarray analysis, from which it was determined that LeuO acts as an activator for 18 of them, 13 as a repressor and 4 as a dual function; we deduced that LeuO acts as a dual regulator.
    [Show full text]
  • Protein-DNA Interactions II
    Protein-DNA Interactions II Bio 5488 Overview 1. Review of information content and weight matrices 2. Online PWM resources 3. Motif discovery: Greedy algorithm 4. Motif discovery: Gibbs sampling Information Content EcoR1 Random Rap1 GAATTC GCCTAC TGTATGGGTG GAATTC ACATTC TGTTCGGATT GAATTC TCATTC TGCATGGGTG GAATTC CGACTC TGTACAGGTG GAATTC GAATTC TGTATGGATG GAATTC ATATCG TGTTCGGGTT GAATTC GAAATG TGTATGGGTG Information Content Matrix of Frequencies A: 0.1 0.7 0.2 0.3 0.4 0.1 C: 0.1 0.1 0.1 0.3 0.2 0.1 G: 0.1 0.1 0.2 0.1 0.2 0.1 T: 0.7 0.1 0.5 0.3 0.2 0.7 aka Relative Entropy, Kullbach-Liebler Distance Obtaining a Weight Matrix Hertz and Stormo, Bioinformatics (1999) Online tools: PWMs JASPAR (jaspar.genereg.net) open-access curated, non-redundant, PWMs for multi-cellular eukaryotes hundreds of PWMs JASPAR_FAM - metamodels for structural families Online tools: PWMs AC M00213 XX ID F$RAP1_C XX DT 05.09.1995 (created); dbo. DT 30.11.1995 (updated); ewi. CO Copyright (C), Biobase GmbH. XX NA RAP1 XX DE yeast repressor/activator protein 1 XX BF T00715 RAP1; Species: yeast, Saccharomyces cerevisiae. XX PO A C G T 01 6.89 2.40 0.79 4.74 W 02 8.80 0.00 4.58 1.43 R 03 7.20 6.83 0.79 0.00 M 04 14.82 0.00 0.00 0.00 A TRANSFAC 05 0.00 14.82 0.00 0.00 C 06 0.00 14.82 0.00 0.00 C Free access w/ reduced functionality 07 0.00 14.82 0.00 0.00 C 08 14.82 0.00 0.00 0.00 A to professional version 09 2.13 0.67 2.87 9.15 T 10 11.92 0.76 1.49 0.64 A 11 0.76 14.06 0.00 0.00 C 12 11.45 3.36 0.00 0.00 A Public version dates to 2005 13 0.79 5.64 0.00 7.10 Y 14 1.64 7.18 1.61 4.39 Y XX BA total weight of sequences: 14.82 XX CC consind generated matrix (random_expectation: 0.03) XX // (www.gene-regulation.com/pub/databases.html/#transfac) Online tools: PWMs RegulonDB (regulondb.ccg.unam.mx) Bacterial regulons and operons in E.
    [Show full text]
  • Synthetic CRISPR-Cas Gene Activators for Transcriptional Reprogramming in Bacteria
    Corrected: Author correction ARTICLE DOI: 10.1038/s41467-018-04901-6 OPEN Synthetic CRISPR-Cas gene activators for transcriptional reprogramming in bacteria Chen Dong1, Jason Fontana2, Anika Patel1, James M. Carothers 2,3,4 & Jesse G. Zalatan 1,2,4 Methods to regulate gene expression programs in bacterial cells are limited by the absence of effective gene activators. To address this challenge, we have developed synthetic bacterial transcriptional activators in E. coli by linking activation domains to programmable CRISPR-Cas 1234567890():,; DNA binding domains. Effective gene activation requires target sites situated in a narrow region just upstream of the transcription start site, in sharp contrast to the relatively flexible target site requirements for gene activation in eukaryotic cells. Together with existing tools for CRISPRi gene repression, these bacterial activators enable programmable control over multiple genes with simultaneous activation and repression. Further, the entire gene expression program can be switched on by inducing expression of the CRISPR-Cas system. This work will provide a foundation for engineering synthetic bacterial cellular devices with applications including diagnostics, therapeutics, and industrial biosynthesis. 1 Department of Chemistry, University of Washington, Seattle, WA 98195, USA. 2 Molecular Engineering & Sciences Institute, University of Washington, Seattle, WA 98195, USA. 3 Department of Chemical Engineering, University of Washington, Seattle, WA 98195, USA. 4 Center for Synthetic Biology, University of Washington, Seattle, WA 98195, USA. Correspondence and requests for materials should be addressed to J.M.C. (email: [email protected]) or to J.G.Z. (email: [email protected]) NATURE COMMUNICATIONS | (2018) 9:2489 | DOI: 10.1038/s41467-018-04901-6 | www.nature.com/naturecommunications 1 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-04901-6 acteria are attractive targets for a wide variety of engi- in bacteria suggests that RpoZ may not be effective as a general Bneering applications.
    [Show full text]
  • A User-Friendly Tool for Improving Bacterial Genome Annotation Through Analysis of Transcription Control Signals
    SigmoID: a user-friendly tool for improving bacterial genome annotation through analysis of transcription control signals Yevgeny Nikolaichik and Aliaksandr U. Damienikan Department of Molecular Biology, Belarusian State University, Minsk, Belarus ABSTRACT The majority of bacterial genome annotations are currently automated and based on a `gene by gene' approach. Regulatory signals and operon structures are rarely taken into account which often results in incomplete and even incorrect gene function assignments. Here we present SigmoID, a cross-platform (OS X, Linux and Windows) open-source application aiming at simplifying the identification of transcription regulatory sites (promoters, transcription factor binding sites and terminators) in bac- terial genomes and providing assistance in correcting annotations in accordance with regulatory information. SigmoID combines a user-friendly graphical interface to well known command line tools with a genome browser for visualising regulatory elements in genomic context. Integrated access to online databases with regulatory information (RegPrecise and RegulonDB) and web-based search engines speeds up genome analysis and simplifies correction of genome annotation. We demonstrate some features of SigmoID by constructing a series of regulatory protein binding site profiles for two groups of bacteria: Soft Rot Enterobacteriaceae (Pectobacterium and Dickeya spp.) and Pseudomonas spp. Furthermore, we inferred over 900 transcription factor binding sites and alternative sigma factor promoters in the annotated genome of Pectobacterium atrosepticum. These regulatory signals control putative transcription units covering about 40% of the P. atrosepticum chromosome. Reviewing the annotation in cases where Submitted 26 January 2016 it didn't fit with regulatory information allowed us to correct product and gene names Accepted 29 April 2016 for over 300 loci.
    [Show full text]
  • Data Resources and Mining Tools for Reconstructing Gene Regulatory Networks in Lactococcus Lactis
    Japanese Journal of Lactic Acid Bacteria Copyright © 2011, Japan Society for Lactic Acid Bacteria Review Data resources and mining tools for reconstructing gene regulatory networks in Lactococcus lactis Anne de Jong1,2,3), Jan Kok1,2,3) and Oscar P. Kuipers1,2,3)* 1) Department of Molecular Genetics, University of Groningen, Groningen Biomolecular Sciences and Biotechnology Institute, Nijenborgh 7, 9747 AG Groningen, the Netherlands 2) Top Institute Food and Nutrition, Wageningen, the Netherlands 3) The Netherlands Kluyver Centre for Genomics of Industrial Fermentations / NCSB, Delft, the Netherlands. Abstract DNA is the blueprint and template for tRNA, rRNA, sRNA and mRNA synthesis in all living organisms. Subsequently, the mRNA is translated to proteins a process in which other RNA types, compounds and proteins play important roles. The regulation of transcription and translation is a delicate equilibrium of thousands of factors interacting within an organism. These factors vary from compound- and protein concentrations, to their activity and stability to physical parameters like temperature and pH. Especially the transcription factors and transcription factor binding sites play a central role in the regulation of gene expression. For lactococci lactis several data sets on gene expression and regulation are available in databases or literature. Furthermore, the number of (in-) complete sequenced lactococci genomes is increasing, while the majority of the strains will only be studied in silico. This review focuses on the visualization and mining of the complex transcription and translation systems via interactive graphics and network reconstruction tools and the amalgamation of DNA microarray data with biological data that together lead to the reconstruction of gene regulatory networks (GRN).
    [Show full text]
  • Nucleic Acids Research, 2004, Vol
    Nucleic Acids Research, 2004, Vol. 32, Database issue D303±D306 DOI: 10.1093/nar/gkh140 RegulonDB (version 4.0): transcriptional regulation, operon organization and growth conditions in Escherichia coli K-12 Heladia Salgado, Socorro Gama-Castro, Agustino MartõÂnez-Antonio, Edgar DõÂaz-Peredo, Fabiola SaÂnchez-Solano, MartõÂn Peralta-Gil, Del®no Garcia-Alonso, Vero nica JimeÂnez-Jacinto, Alberto Santos-Zavaleta, CeÂsar Bonavides-MartõÂnez and Julio Collado-Vides* Program of Computational Genomics, CIFN, UNAM. A.P. 565-A Cuernavaca, Morelos 62100, Mexico Received September 12, 2003; Revised October 8, 2003; Accepted October 29, 2003 ABSTRACT the molecular biology of this cell. It may be that this is the cell for which we know more about the function of its genes, its RegulonDB is the primary database of the major metabolism and transcriptional regulation. This knowledge is international maintained curation of original litera- the foundation for the proposal within the International E.coli ture with experimental knowledge about the Alliance, to achieve in E.coli, as a long-term goal, the ®rst elements and interactions of the network of tran- whole-cell model (1). We contribute to this international effort scriptional regulation in Escherichia coli K-12. This with RegulonDB, the primary database of the major inter- includes mechanistic information about operon national maintained curation of original literature with organization and their decomposition into transcrip- experimental knowledge about the elements and interactions tion units
    [Show full text]
  • Dataset Description
    Promoter_from_454_Dataset.txt Programa de Genómica Computacional / Centro de Ciencias Genómicas DATASET DESCRIPTION DATASET: Promoter_from_454_Dataset.txt. Experimental determination of Transcription Star Sites (TSSs) with High Throughput Pyrosequencing Strategy (HTPS) and computational promoter prediction. Contact person for this dataset: Person: RegulonDB team Email address: [email protected] Type of dataset: Experimental TSS mapping and computational promoter prediction Reference: Alfredo Mendoza, Leticia Olvera, Maricela Olvera, Ricardo Grande, Veronica Jiménez-Jacinto, Blanca Taboada, Leticia Vega, Katy Juárez, Heladia Salgado, Araceli Huerta, Julio Collado-Vides and Enrique Morett. (2009). Genome-wide Identification of Transcription Start Sites, Promoters and Transcription Factor Binding Sites in E. coli . PLoS ONE. 4(10):e7526. Description: This file contains: High Throughput Pyrosequencing Strategy (HTPS) Data, with and without previously know TSSs. For many, promoter region was predicted in this work. Summary: This file has a collection of more than 1491 transcription start sites (TSSs) that have been experimentally determined with high precision in Escherichia coli using an unbiased High Throughput Pyrosequencing Strategy (HTPS). From this collection, about 148 corresponded to previously reported TSSs, which helped us to benchmark both our methodologies and the accuracy of the previous mapping experiments. The other ca 1300 TSSs mapped belong to about 900 different genes, many of them with no assigned function. Página 1 de 4 Promoter_from_454_Dataset.txt Programa de Genómica Computacional / Centro de Ciencias Genómicas Methods Version of programs: We use blast version BLASTN 2.2.18 For the σ38 -10 element a new matrix was constructed with the WCONSENSUS program (version 6d), utilizing the nucleotide sequences of 71 promoters of E.
    [Show full text]
  • Investigating Binding Site Ratios of Transcription Factors Center for Advanced Biotechnology and Medicine, Stock Lab Zeyue Li Rutgers University
    Investigating Binding Site Ratios of Transcription Factors Center for Advanced Biotechnology and Medicine, Stock Lab Zeyue Li Rutgers University Abstract Results Bacterial signal transduction is controlled by transcription factors (TFs), which bind to specific binding sites on bacterial chromosomes.3 Expression levels of TFs are often presumed to be in Protein Abundance Data from PaxDB TFBS Identified by High Throughput Methods great excess to their binding sites, leading to high ratios of TF concentrations to the numbers of TF binding sites. Previous studies have suggested a median ratio of 10.4 Here we used the updated binding data from regulonDB and multiple protein abundance databases to reexamine the ratios for E. coli TFs. We discovered that TFs are unlikely to be in great excess to their binding sites. Further, we believe that the binding site data from regulonDB is biased toward the intergenic promoter regions and there are more binding sites that have not been discovered, especially those within open reading frames. To better estimate the number of binding sites, we took a bioinformatics direction, using multiple sources, including high-throughput data, gSELEX data, and computational predictions using FIMO scan of the bacterial genome. Although different sources vary in their reports of the number of binding sites, all indicated at least two fold higher numbers of binding sites than those from regulonDB. Overall, we found that repressors tend to have a higher ratio compared to activators and dual function TFs, and Figure 1: The median ratio of TF concentration (PaxDB) to TFBS is 7.23 we found that the originally reported ratio is higher than what studies with newer technology Figure 7: Box plot of the ratio of high throughput data to regulonDB data list.
    [Show full text]
  • Determination and Dissection of DNA-Binding Specificity for The
    International Journal of Molecular Sciences Article Determination and Dissection of DNA-Binding Specificity for the Thermus thermophilus HB8 Transcriptional Regulator TTHB099 Kristi Moncja and Michael W. Van Dyke * Department of Chemistry and Biochemistry, Kennesaw State University, Kennesaw, GA 30144, USA; [email protected] * Correspondence: [email protected]; Tel.: +1-470-578-2793 Received: 14 October 2020; Accepted: 24 October 2020; Published: 26 October 2020 Abstract: Transcription factors (TFs) have been extensively researched in certain well-studied organisms, but far less so in others. Following the whole-genome sequencing of a new organism, TFs are typically identified through their homology with related proteins in other organisms. However, recent findings demonstrate that structurally similar TFs from distantly related bacteria are not usually evolutionary orthologs. Here we explore TTHB099, a cAMP receptor protein (CRP)-family TF from the extremophile Thermus thermophilus HB8. Using the in vitro iterative selection method Restriction Endonuclease Protection, Selection and Amplification (REPSA), we identified the preferred DNA-binding motif for TTHB099, 50–TGT(A/g)NBSYRSVN(T/c)ACA–30, and mapped potential binding sites and regulated genes within the T. thermophilus HB8 genome. Comparisons with expression profile data in TTHB099-deficient and wild type strains suggested that, unlike E. coli CRP (CRPEc), TTHB099 does not have a simple regulatory mechanism. However, we hypothesize that TTHB099 can be a dual-regulator similar to CRPEc. Keywords: bioinformatics; biolayer interferometry (BLI); electrophoretic mobility shift assay (EMSA); extremophile; protein-DNA binding; type IIS restriction endonuclease 1. Introduction Transcription factors (TFs) are DNA-binding proteins that allow for modulation of transcription initiation in response to intracellular and extracellular changes.
    [Show full text]
  • A Biophysical Approach to Transcription Factor Binding Site Discovery Marko Djordjevic,1 Anirvan M
    Downloaded from genome.cshlp.org on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press Article A Biophysical Approach to Transcription Factor Binding Site Discovery Marko Djordjevic,1 Anirvan M. Sengupta,2 and Boris I. Shraiman2,3 1Department of Physics, Columbia University, New York, New York 10025, USA; 2Department of Physics and BioMaPSInstitute, Rutgers University, Piscataway, New Jersey 08854, USA Identification of transcription factor binding sites within regulatory segments of genomic DNA is an important step toward understanding of the regulatory circuits that control expression of genes. Here, we describe a novel bioinformatics method that bases classification of potential binding sites explicitly on the estimate of sequence-specific binding energy of a given transcription factor. The method also estimates the chemical potential of the factor that defines the threshold of binding. In contrast with the widely used information-theoretic weight matrix method, the new approach correctly describes saturation in the transcription factor/DNA binding probability. This results in a significant improvement in the number of expected false positives, particularly in the ubiquitous case of low-specificity factors. In the strong binding limit, the algorithm is related to the “support vector machine” approach to pattern recognition. The new method is used to identify likely genomic binding sites for the E. coli transcription factors collected in the DPInteract database. In addition, for CRP (a global regulatory factor), the likely regulatory modality (i.e., repressor or activator) of predicted binding sites is determined. [Supplemental material is available online at www.genome.org. The complete list of predicted sites may be found at http://www.biomaps.rutgers.edu/bioinformatics/QPMEME.htm.] Molecular biology has been revolutionized by the availability of some of the candidate regulatory sites.
    [Show full text]