Data Resources and Mining Tools for Reconstructing Gene Regulatory Networks in Lactococcus Lactis

Total Page:16

File Type:pdf, Size:1020Kb

Data Resources and Mining Tools for Reconstructing Gene Regulatory Networks in Lactococcus Lactis Japanese Journal of Lactic Acid Bacteria Copyright © 2011, Japan Society for Lactic Acid Bacteria Review Data resources and mining tools for reconstructing gene regulatory networks in Lactococcus lactis Anne de Jong1,2,3), Jan Kok1,2,3) and Oscar P. Kuipers1,2,3)* 1) Department of Molecular Genetics, University of Groningen, Groningen Biomolecular Sciences and Biotechnology Institute, Nijenborgh 7, 9747 AG Groningen, the Netherlands 2) Top Institute Food and Nutrition, Wageningen, the Netherlands 3) The Netherlands Kluyver Centre for Genomics of Industrial Fermentations / NCSB, Delft, the Netherlands. Abstract DNA is the blueprint and template for tRNA, rRNA, sRNA and mRNA synthesis in all living organisms. Subsequently, the mRNA is translated to proteins a process in which other RNA types, compounds and proteins play important roles. The regulation of transcription and translation is a delicate equilibrium of thousands of factors interacting within an organism. These factors vary from compound- and protein concentrations, to their activity and stability to physical parameters like temperature and pH. Especially the transcription factors and transcription factor binding sites play a central role in the regulation of gene expression. For lactococci lactis several data sets on gene expression and regulation are available in databases or literature. Furthermore, the number of (in-) complete sequenced lactococci genomes is increasing, while the majority of the strains will only be studied in silico. This review focuses on the visualization and mining of the complex transcription and translation systems via interactive graphics and network reconstruction tools and the amalgamation of DNA microarray data with biological data that together lead to the reconstruction of gene regulatory networks (GRN). These networks are a valuable source of information and knowledge that can be used for studying L. lactis physiology and provide clues for improving industrial strains by either non-GMO methods (strain selection, fermentation condition) or by specific engineering. Key words: DNA microarray, Gene Regulatory Network, Time series, Transcription factor, Transcription factor binding site, Lactococcus lactis Lactococcus lactis products as a rich source for proteins, small peptides, amino acids, fatty acids, calcium and other valuable Since 10.000 years humans consume fermented milk nutrients. The milk sugar lactose is fermented to lactic acid; the consequent acidification prevents food spoilage and is probably the major driving force of *To whom correspondence should be addressed. the success of fermented milk products. Although Phone : +31-50-3632092 nowadays other food preservation methods are applied, Fax : +31-50-3632348 E-mail : [email protected] the acidification of the product is still the major growth : [email protected] inhibitor of bacteria other than Lactococci. The global : [email protected] world market of milk-related products is estimated —3— Vol. 22, No. 1 Japanese Journal of Lactic Acid Bacteria at € 240 billion (IMIS Euromonitor 2006) explaining smaller genes. Prediction of a small ORF (smaller than the high interest of the diary industry in research on 120-150 bases) is still very complicated. In annotation food process management and improvement. To gain processes small ORFs are habitually discarded when more insight in flavour formation, intensive research they do not show homology to known proteins. Of is taking place to unraveling genomes, transcriptomes, course, in this way, novel small genes will never proteomes and (bio-) chemical pathways. The Holy be discovered. BAGEL2 9), a bacteriocin mining Grail would be to build a predictive in silico model of tool, circumvents this issue by taking the genomic each bacterial strain involved in milk fermentation. context into account and is able to discover small Probably this holistic result will not be obtained ORFs encoding novel lantibiotics. The main resources soon, not even for one of the organisms, but recent for annotated genomes, NCBI, EBI and JGI, offer work has already led to models describing various multiple ORF predictions tools made by Glimmer3 and relatively simple processes in the bacterial cell. The PRODIGAL. basis of this research is the availability of genome sequences of lactic acid bacteria but, although many Operon prediction L. lactis genomes have been sequenced, mainly by industrial companies, only a few are available in the An operon is defined as a cluster of genes that public domain (from the strains MG1363, IL1403, SK11 are translated from one mRNA, which is thus called and KF147) at“The Integrated Microbial Genomes a polycistronic messenger, and are under control of (IMG) system”1). In this review we will discuss the the same promoter. To accurately predict operon state of the art bioinformatics tools that facilitate the purely on the basis of genome sequences is still a reconstruction and visualization of gene regulatory challenging task. Using gene direction, gene spacing networks in L. lactis. and transcription terminator prediction programs in combination with comparative genomics, up to 95% Gene prediction accuracy of operon prediction can be achieved 10). Currently, databases provide operons predicted Normally, splicing of mRNA does not occur in in publically accessible genomes. For the de novo prokaryotes, which would suggest that prediction annotation or re-annotation of operons, UNIPOP of a protein encoding gene (open reading frame provides an easy, quick and accurate way 11, 12). (ORF)) is a simple and straight forward process. DOOR 13) (Database of prOkaryotic OpeRons) using the Although prediction methods like GeneMarkHMM 2), algorithm developed by Dam 14)) which is believed to Glimmer 3 3), TriTISA 4), TICO 5), Easygene 6), MED be the most accurate prediction method as described 2.0 7) and PRODIGAL 8) predict up to 90% of the genes in an intensive comparison study 15). MicrobesOnline is correctly in most prokaryotes, still quite a number of a product of the Virtual Institute for Microbial Stress genes are incorrect predicted or missed 8). All methods and Survival; it provides operons for 620 genomes struggle with predicting correct translation start while OperonDB contains operons for 550 genomes, point (AUG, GUG and UUG) and use Shine-Dalgarno ODB 16) for 203 genomes. RegulonDB 17) basically Ribosomal Binding Site (RBS) motifs to improve provides E. coli operons and DBTBS 18) those for B. the prediction of the translation initiation site. An subtilis. While operons in OperonDB 19), Microbes important difference between coding and non-coding Online and ODB 16) are only predicted, operons defined DNA regions is the GC bias and this phenomenon has in RegulonDB and DBTBS are mainly based on been used to improve the ORF prediction. For each laboratory experiments. organism this bias can be calculated and is used to plot reading frame information for detection of the Transcriptomics spurious upstream gene region. Furthermore, modern prediction methods use an ORF length threshold of 90- Bacteria often encounter changes in their 120 bp, because of the high rate of false positives for environment and will adapt the expression of genes —4— Jpn. J. Lactic Acid Bact. Vol. 22, No. 1 and activity of proteins to optimize growth, which Transcription regulators in Lactococcus lactis is of utmost importance to survive in a competitive niche. By sensing the changes in the intracellular In the genome of L. lactis subsp. cremoris MG1363 and/or extracellular environment, the cell can adjust 154 genes are annotated as ncodong TFs. Among the transcription of genes and subsequently change these, 32 have been studied in either L. lactis MG1363 its protein content, to cope with the new situation. or in L. lactis IL1403 and among these, 7 genes encode Transcription of genes (gene expression) is tightly TFs that are members of a two component system regulated and mostly mediated by sigma factors and (Table 1). Another 55 TFs have high homology to other trans-acting transcription factors (transcription regulators in other bacteria with known function regulator proteins; TFs). Furthermore, micro RNA’s (not shown) while 68 are unknown TFs annotated as and T-boxes or S-boxes, can be involved in mediating putative, based on conserved protein motifs involved the level of transcription. The TFs can be activated or in DNA-binding. deactivated when they directly or indirectly“sense” changes in concentration of physical constants (e.g. Regulons temperature), ions or other molecules in the cell. Besides these types of TFs, two component signal Genes or operons that are under control of the same transduction systems (e.g. by phosphorelay) exist TF are member of a regulon. Although prediction that specifically react on external signals 20). DNA methods for regulons have been substantially microarray studies show the expression of all genes Table 1. Transcriptional regulators studied in L. lactis at a defined state of the cells transcriptome. Different MG1363 and/or IL1403 growth rates, stress or other conditions and also the Gene Literature co-expression of genes can be used to build a gene- AhrC MG1363 57, 58) 57, 58) to-gene interactive network including their deduced ArgR MG1363 BusR 59, 60) proteins and subsequent metabolic compounds. The 61) CcpA MG1363 co-regulation of a collection of genes by the same TF CodY MG1363 22, 62) is called a regulon. ComX 63) CopR IL1403 64) CtsR MG1363 65, 66) Transcription factors FhuR IL1403 67) FlpA MG1363 68) 68) Transcription of DNA to mRNA can be activated FlpB MG1363 FruR 69) or repressed when a TF binds to a specific cis- 70, 71) GadR regulatory DNA element (TFBS) in the promoter GntR MG1363 72) 73) region of an operon. The DNA binding region of HdiR MG1363 HisZ 74, 75) TFs ofthen consists of a conserved Helix-Turn-Helix LlrA MG1363 76) secondary structure, a motif that is prevalent in TFs LlrB MG1363 76) of prokaryotes 21). Furthermore, it is common that a LlrC MG1363 76) 77) co-factor molecule is required to activate a TF e.g. LlrD LlrE MG1363 76) branched-chain amino acids (BCAAs) for CodY of L.
Recommended publications
  • Escherichia Coli: Protein Families and Binding Sites M
    Functional determinants of transcription factors in Escherichia coli: protein families and binding sites M. Madan Babu and Sarah A. Teichmann DNA-binding transcription factors regulate the expression A reasonable hypothesis would be to assume that of genes near to where they bind. These factors can be activators, repressors and dual regulators are more activators or repressors of transcription, or both. Thus, a closely related to the other proteins within each fundamental question is what determines whether a regulatory group than to proteins in other groups. This transcription factor acts as an activator or repressor? would mean that there would be protein families and Previous research into this question found that a protein's domain architectures that were characteristic either of regulatory function is determined by one or more of the activators, repressors or dual regulators. Prag et al. [10] following factors: protein–proteinprotein–protein contacts, position of the and Perez-Rueda et al. [11] reported evidence to support DNA-binding domain in the protein primary sequence, this by relating the position of helix-turn-helix motifs altered DNA structure, and the position of its binding site along the primary sequence to a protein's regulatory on the DNA relative to the transcription start site. Although function. They used sequence comparison and profile there are many aspects specific to different transcription methods to locate the helix-turn-helix motifs that factors, in this work we demonstrate that, in general, in the represent DBDs, and found that the helix-turn-helix prokaryote Escherichia coli, a transcription factor's protein tends to be at the N terminus for repressors and at the C family is not indicative of its regulatory function, but the terminus for activators.
    [Show full text]
  • Regulondb (Version 6.0): Gene Regulation Model Of
    D120–D124 Nucleic Acids Research, 2008, Vol. 36, Database issue doi:10.1093/nar/gkm994 RegulonDB (version 6.0): gene regulation model of Escherichia coli K-12 beyond transcription, active (experimental) annotated promoters and Textpresso navigation Socorro Gama-Castro1, Vero´ nica Jime´ nez-Jacinto1, Martı´n Peralta-Gil1, Alberto Santos-Zavaleta1,Mo´ nica I. Pen˜ aloza-Spinola1, Bruno Contreras-Moreira1, Juan Segura-Salazar1, Luis Mun˜ iz-Rascado1, Irma Martı´nez-Flores1, Heladia Salgado1,Ce´ sar Bonavides-Martı´nez1, Cei Abreu-Goodger1, Carlos Rodrı´guez-Penagos1, Juan Miranda-Rı´os2, Enrique Morett2, Enrique Merino2, Araceli M. Huerta1, Luis Trevin˜ o-Quintanilla1 and Julio Collado-Vides1,* Downloaded from 1Program of Computational Genomics, Centro de Ciencias Geno´ micas, Universidad Nacional Auto´ noma de Me´ xico, A.P. 565-A, Cuernavaca, Morelos 62100, Mexico and 2Instituto de Biotecnologı´a, Universidad Nacional Auto´ noma de Me´ xico, A.P. 510-3, Cuernavaca, Morelos 62100, Mexico http://nar.oxfordjournals.org/ Received September 14, 2007; Revised October 19, 2007; Accepted October 22, 2007 ABSTRACT with more than 4000 curation notes, can now be RegulonDB (http://regulondb.ccg.unam.mx/) is the searched with the Textpresso text mining engine. primary reference database offering curated knowl- edge of the transcriptional regulatory network at Carleton University on June 21, 2015 of Escherichia coli K12, currently the best-known INTRODUCTION electronically encoded database of the genetic A major current task for bioinformatics is that of repre- regulatory network of any free-living organism. senting biological information into an electronic and This paper summarizes the improvements, new computable form. These computational representations biology and new features available in version 6.0.
    [Show full text]
  • Description Data File Programa De Genómica Computacional / Centro De Ciencias Genómicas
    Description Data File Programa de Genómica Computacional / Centro de Ciencias Genómicas DESCRIPTION DATA FILE 1. GENERAL INFORMATION. Title: LeuO and H-NS Data Set Reference: Shimada T, Bridier A, Briandet R, Ishihama A. Novel roles of LeuO in transcription regulation of E. coli genome: antagonistic interplay with the universal silencer H-NS. Mol Microbiol. 2011 Oct;82(2):378-97. PMID: 21883529 Contact person for this data set: Questions concerning the content of the data set that are raised by users of RegulonDB will be forwarded to this person. We would appreciate receiving a copy of the response to the user, so we can keep track of taking care of user requests. Person: RegulonDB staff Email address: [email protected] 2. DATA SET DESCRIPTION. Summary: LeuO Data Set. LeuO, the regulator of the leucine biosynthesis operon of Escherichia coli, is involved in the regulation of as-yet-unspecified genes that affect the stress response and pathogenesis expression. LeuO is involved in an antagonistic interplay with the universal silencer H-NS. Genomic SELEX screening was performed to identify the whole set of LeuO and H-NS regulation targets. A total of 140 LeuO-binding peaks (183 regulatory interactions) were identified by SELEX methodology, of which as many as 133 (95%) were found to contain the binding site of H-NS. In addition, a DNA microarray assay was carried out to identify genes affected by deletion or overexpression of the leuO gene. A total of 35 regulatory interactions were found via SELEX as well as microarray analysis. Based on the behavior of the regulated genes in the microarray analysis, from which it was determined that LeuO acts as an activator for 18 of them, 13 as a repressor and 4 as a dual function; we deduced that LeuO acts as a dual regulator.
    [Show full text]
  • Protein-DNA Interactions II
    Protein-DNA Interactions II Bio 5488 Overview 1. Review of information content and weight matrices 2. Online PWM resources 3. Motif discovery: Greedy algorithm 4. Motif discovery: Gibbs sampling Information Content EcoR1 Random Rap1 GAATTC GCCTAC TGTATGGGTG GAATTC ACATTC TGTTCGGATT GAATTC TCATTC TGCATGGGTG GAATTC CGACTC TGTACAGGTG GAATTC GAATTC TGTATGGATG GAATTC ATATCG TGTTCGGGTT GAATTC GAAATG TGTATGGGTG Information Content Matrix of Frequencies A: 0.1 0.7 0.2 0.3 0.4 0.1 C: 0.1 0.1 0.1 0.3 0.2 0.1 G: 0.1 0.1 0.2 0.1 0.2 0.1 T: 0.7 0.1 0.5 0.3 0.2 0.7 aka Relative Entropy, Kullbach-Liebler Distance Obtaining a Weight Matrix Hertz and Stormo, Bioinformatics (1999) Online tools: PWMs JASPAR (jaspar.genereg.net) open-access curated, non-redundant, PWMs for multi-cellular eukaryotes hundreds of PWMs JASPAR_FAM - metamodels for structural families Online tools: PWMs AC M00213 XX ID F$RAP1_C XX DT 05.09.1995 (created); dbo. DT 30.11.1995 (updated); ewi. CO Copyright (C), Biobase GmbH. XX NA RAP1 XX DE yeast repressor/activator protein 1 XX BF T00715 RAP1; Species: yeast, Saccharomyces cerevisiae. XX PO A C G T 01 6.89 2.40 0.79 4.74 W 02 8.80 0.00 4.58 1.43 R 03 7.20 6.83 0.79 0.00 M 04 14.82 0.00 0.00 0.00 A TRANSFAC 05 0.00 14.82 0.00 0.00 C 06 0.00 14.82 0.00 0.00 C Free access w/ reduced functionality 07 0.00 14.82 0.00 0.00 C 08 14.82 0.00 0.00 0.00 A to professional version 09 2.13 0.67 2.87 9.15 T 10 11.92 0.76 1.49 0.64 A 11 0.76 14.06 0.00 0.00 C 12 11.45 3.36 0.00 0.00 A Public version dates to 2005 13 0.79 5.64 0.00 7.10 Y 14 1.64 7.18 1.61 4.39 Y XX BA total weight of sequences: 14.82 XX CC consind generated matrix (random_expectation: 0.03) XX // (www.gene-regulation.com/pub/databases.html/#transfac) Online tools: PWMs RegulonDB (regulondb.ccg.unam.mx) Bacterial regulons and operons in E.
    [Show full text]
  • A Database of Regulatory Networks in Gamma-Proteobacterial Genomes Abel D
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Open Repository and Bibliography - Luxembourg D98–D102 Nucleic Acids Research, 2005, Vol. 33, Database issue doi:10.1093/nar/gki054 TRACTOR_DB: a database of regulatory networks in gamma-proteobacterial genomes Abel D. Gonza´lez, Vladimir Espinosa, Ana T. Vasconcelos1, Ernesto Pe´rez-Rueda2 and Julio Collado-Vides3,* National Bioinformatics Center, Industria y San Jose´, Capitolio Nacional, CP. 10200, Habana Vieja, Habana, Cuba, 1National Laboratory for Scientific Computing, Avenue Getulio Vargas 333, Quitandinha, CEP 25651-075, Petropolis, Rio de Janeiro, Brazil, 2Depto. de Ingenierı`a Celular y Biocata´lisis, IBT-UNAM, Cuernavaca, Morelos, Mexico and 3Center of Genomics, UNAM, AP 565-A Cuernavaca, CP. 62100, Morelos, Mexico Received August 6, 2004; Revised and Accepted October 1, 2004 Downloaded from ABSTRACT out in this direction (1–3). The first step towards this goal is the recognition of all the genes regulated by a transcription factor Experimental data on the Escherichia coli (TF), i.e. its regulon. transcriptional regulatory system has been used in Computational approaches to recognizing the location of http://nar.oxfordjournals.org/ the past years to predict new regulatory elements regulatory sites in bacterial genomes include the use of weight (promoters, transcription factors (TFs), TFs’ binding matrices (4), phylogenetic footprinting (5), searching for sites and operons) within its genome. As more gen- statistical overrepresentation of oligonucleotides within a omes of gamma-proteobacteria are being sequenced, genome and clustering co-expressed genes in order to find the prediction of these elements in a growing number conserved patterns in their upstream regions (6,7), among of organisms has become more feasible, as a others (8,9).
    [Show full text]
  • Synthetic CRISPR-Cas Gene Activators for Transcriptional Reprogramming in Bacteria
    Corrected: Author correction ARTICLE DOI: 10.1038/s41467-018-04901-6 OPEN Synthetic CRISPR-Cas gene activators for transcriptional reprogramming in bacteria Chen Dong1, Jason Fontana2, Anika Patel1, James M. Carothers 2,3,4 & Jesse G. Zalatan 1,2,4 Methods to regulate gene expression programs in bacterial cells are limited by the absence of effective gene activators. To address this challenge, we have developed synthetic bacterial transcriptional activators in E. coli by linking activation domains to programmable CRISPR-Cas 1234567890():,; DNA binding domains. Effective gene activation requires target sites situated in a narrow region just upstream of the transcription start site, in sharp contrast to the relatively flexible target site requirements for gene activation in eukaryotic cells. Together with existing tools for CRISPRi gene repression, these bacterial activators enable programmable control over multiple genes with simultaneous activation and repression. Further, the entire gene expression program can be switched on by inducing expression of the CRISPR-Cas system. This work will provide a foundation for engineering synthetic bacterial cellular devices with applications including diagnostics, therapeutics, and industrial biosynthesis. 1 Department of Chemistry, University of Washington, Seattle, WA 98195, USA. 2 Molecular Engineering & Sciences Institute, University of Washington, Seattle, WA 98195, USA. 3 Department of Chemical Engineering, University of Washington, Seattle, WA 98195, USA. 4 Center for Synthetic Biology, University of Washington, Seattle, WA 98195, USA. Correspondence and requests for materials should be addressed to J.M.C. (email: [email protected]) or to J.G.Z. (email: [email protected]) NATURE COMMUNICATIONS | (2018) 9:2489 | DOI: 10.1038/s41467-018-04901-6 | www.nature.com/naturecommunications 1 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-04901-6 acteria are attractive targets for a wide variety of engi- in bacteria suggests that RpoZ may not be effective as a general Bneering applications.
    [Show full text]
  • A User-Friendly Tool for Improving Bacterial Genome Annotation Through Analysis of Transcription Control Signals
    SigmoID: a user-friendly tool for improving bacterial genome annotation through analysis of transcription control signals Yevgeny Nikolaichik and Aliaksandr U. Damienikan Department of Molecular Biology, Belarusian State University, Minsk, Belarus ABSTRACT The majority of bacterial genome annotations are currently automated and based on a `gene by gene' approach. Regulatory signals and operon structures are rarely taken into account which often results in incomplete and even incorrect gene function assignments. Here we present SigmoID, a cross-platform (OS X, Linux and Windows) open-source application aiming at simplifying the identification of transcription regulatory sites (promoters, transcription factor binding sites and terminators) in bac- terial genomes and providing assistance in correcting annotations in accordance with regulatory information. SigmoID combines a user-friendly graphical interface to well known command line tools with a genome browser for visualising regulatory elements in genomic context. Integrated access to online databases with regulatory information (RegPrecise and RegulonDB) and web-based search engines speeds up genome analysis and simplifies correction of genome annotation. We demonstrate some features of SigmoID by constructing a series of regulatory protein binding site profiles for two groups of bacteria: Soft Rot Enterobacteriaceae (Pectobacterium and Dickeya spp.) and Pseudomonas spp. Furthermore, we inferred over 900 transcription factor binding sites and alternative sigma factor promoters in the annotated genome of Pectobacterium atrosepticum. These regulatory signals control putative transcription units covering about 40% of the P. atrosepticum chromosome. Reviewing the annotation in cases where Submitted 26 January 2016 it didn't fit with regulatory information allowed us to correct product and gene names Accepted 29 April 2016 for over 300 loci.
    [Show full text]
  • Nucleic Acids Research, 2004, Vol
    Nucleic Acids Research, 2004, Vol. 32, Database issue D303±D306 DOI: 10.1093/nar/gkh140 RegulonDB (version 4.0): transcriptional regulation, operon organization and growth conditions in Escherichia coli K-12 Heladia Salgado, Socorro Gama-Castro, Agustino MartõÂnez-Antonio, Edgar DõÂaz-Peredo, Fabiola SaÂnchez-Solano, MartõÂn Peralta-Gil, Del®no Garcia-Alonso, Vero nica JimeÂnez-Jacinto, Alberto Santos-Zavaleta, CeÂsar Bonavides-MartõÂnez and Julio Collado-Vides* Program of Computational Genomics, CIFN, UNAM. A.P. 565-A Cuernavaca, Morelos 62100, Mexico Received September 12, 2003; Revised October 8, 2003; Accepted October 29, 2003 ABSTRACT the molecular biology of this cell. It may be that this is the cell for which we know more about the function of its genes, its RegulonDB is the primary database of the major metabolism and transcriptional regulation. This knowledge is international maintained curation of original litera- the foundation for the proposal within the International E.coli ture with experimental knowledge about the Alliance, to achieve in E.coli, as a long-term goal, the ®rst elements and interactions of the network of tran- whole-cell model (1). We contribute to this international effort scriptional regulation in Escherichia coli K-12. This with RegulonDB, the primary database of the major inter- includes mechanistic information about operon national maintained curation of original literature with organization and their decomposition into transcrip- experimental knowledge about the elements and interactions tion units
    [Show full text]
  • Dataset Description
    Promoter_from_454_Dataset.txt Programa de Genómica Computacional / Centro de Ciencias Genómicas DATASET DESCRIPTION DATASET: Promoter_from_454_Dataset.txt. Experimental determination of Transcription Star Sites (TSSs) with High Throughput Pyrosequencing Strategy (HTPS) and computational promoter prediction. Contact person for this dataset: Person: RegulonDB team Email address: [email protected] Type of dataset: Experimental TSS mapping and computational promoter prediction Reference: Alfredo Mendoza, Leticia Olvera, Maricela Olvera, Ricardo Grande, Veronica Jiménez-Jacinto, Blanca Taboada, Leticia Vega, Katy Juárez, Heladia Salgado, Araceli Huerta, Julio Collado-Vides and Enrique Morett. (2009). Genome-wide Identification of Transcription Start Sites, Promoters and Transcription Factor Binding Sites in E. coli . PLoS ONE. 4(10):e7526. Description: This file contains: High Throughput Pyrosequencing Strategy (HTPS) Data, with and without previously know TSSs. For many, promoter region was predicted in this work. Summary: This file has a collection of more than 1491 transcription start sites (TSSs) that have been experimentally determined with high precision in Escherichia coli using an unbiased High Throughput Pyrosequencing Strategy (HTPS). From this collection, about 148 corresponded to previously reported TSSs, which helped us to benchmark both our methodologies and the accuracy of the previous mapping experiments. The other ca 1300 TSSs mapped belong to about 900 different genes, many of them with no assigned function. Página 1 de 4 Promoter_from_454_Dataset.txt Programa de Genómica Computacional / Centro de Ciencias Genómicas Methods Version of programs: We use blast version BLASTN 2.2.18 For the σ38 -10 element a new matrix was constructed with the WCONSENSUS program (version 6d), utilizing the nucleotide sequences of 71 promoters of E.
    [Show full text]
  • Investigating Binding Site Ratios of Transcription Factors Center for Advanced Biotechnology and Medicine, Stock Lab Zeyue Li Rutgers University
    Investigating Binding Site Ratios of Transcription Factors Center for Advanced Biotechnology and Medicine, Stock Lab Zeyue Li Rutgers University Abstract Results Bacterial signal transduction is controlled by transcription factors (TFs), which bind to specific binding sites on bacterial chromosomes.3 Expression levels of TFs are often presumed to be in Protein Abundance Data from PaxDB TFBS Identified by High Throughput Methods great excess to their binding sites, leading to high ratios of TF concentrations to the numbers of TF binding sites. Previous studies have suggested a median ratio of 10.4 Here we used the updated binding data from regulonDB and multiple protein abundance databases to reexamine the ratios for E. coli TFs. We discovered that TFs are unlikely to be in great excess to their binding sites. Further, we believe that the binding site data from regulonDB is biased toward the intergenic promoter regions and there are more binding sites that have not been discovered, especially those within open reading frames. To better estimate the number of binding sites, we took a bioinformatics direction, using multiple sources, including high-throughput data, gSELEX data, and computational predictions using FIMO scan of the bacterial genome. Although different sources vary in their reports of the number of binding sites, all indicated at least two fold higher numbers of binding sites than those from regulonDB. Overall, we found that repressors tend to have a higher ratio compared to activators and dual function TFs, and Figure 1: The median ratio of TF concentration (PaxDB) to TFBS is 7.23 we found that the originally reported ratio is higher than what studies with newer technology Figure 7: Box plot of the ratio of high throughput data to regulonDB data list.
    [Show full text]
  • Determination and Dissection of DNA-Binding Specificity for The
    International Journal of Molecular Sciences Article Determination and Dissection of DNA-Binding Specificity for the Thermus thermophilus HB8 Transcriptional Regulator TTHB099 Kristi Moncja and Michael W. Van Dyke * Department of Chemistry and Biochemistry, Kennesaw State University, Kennesaw, GA 30144, USA; [email protected] * Correspondence: [email protected]; Tel.: +1-470-578-2793 Received: 14 October 2020; Accepted: 24 October 2020; Published: 26 October 2020 Abstract: Transcription factors (TFs) have been extensively researched in certain well-studied organisms, but far less so in others. Following the whole-genome sequencing of a new organism, TFs are typically identified through their homology with related proteins in other organisms. However, recent findings demonstrate that structurally similar TFs from distantly related bacteria are not usually evolutionary orthologs. Here we explore TTHB099, a cAMP receptor protein (CRP)-family TF from the extremophile Thermus thermophilus HB8. Using the in vitro iterative selection method Restriction Endonuclease Protection, Selection and Amplification (REPSA), we identified the preferred DNA-binding motif for TTHB099, 50–TGT(A/g)NBSYRSVN(T/c)ACA–30, and mapped potential binding sites and regulated genes within the T. thermophilus HB8 genome. Comparisons with expression profile data in TTHB099-deficient and wild type strains suggested that, unlike E. coli CRP (CRPEc), TTHB099 does not have a simple regulatory mechanism. However, we hypothesize that TTHB099 can be a dual-regulator similar to CRPEc. Keywords: bioinformatics; biolayer interferometry (BLI); electrophoretic mobility shift assay (EMSA); extremophile; protein-DNA binding; type IIS restriction endonuclease 1. Introduction Transcription factors (TFs) are DNA-binding proteins that allow for modulation of transcription initiation in response to intracellular and extracellular changes.
    [Show full text]
  • A Biophysical Approach to Transcription Factor Binding Site Discovery Marko Djordjevic,1 Anirvan M
    Downloaded from genome.cshlp.org on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press Article A Biophysical Approach to Transcription Factor Binding Site Discovery Marko Djordjevic,1 Anirvan M. Sengupta,2 and Boris I. Shraiman2,3 1Department of Physics, Columbia University, New York, New York 10025, USA; 2Department of Physics and BioMaPSInstitute, Rutgers University, Piscataway, New Jersey 08854, USA Identification of transcription factor binding sites within regulatory segments of genomic DNA is an important step toward understanding of the regulatory circuits that control expression of genes. Here, we describe a novel bioinformatics method that bases classification of potential binding sites explicitly on the estimate of sequence-specific binding energy of a given transcription factor. The method also estimates the chemical potential of the factor that defines the threshold of binding. In contrast with the widely used information-theoretic weight matrix method, the new approach correctly describes saturation in the transcription factor/DNA binding probability. This results in a significant improvement in the number of expected false positives, particularly in the ubiquitous case of low-specificity factors. In the strong binding limit, the algorithm is related to the “support vector machine” approach to pattern recognition. The new method is used to identify likely genomic binding sites for the E. coli transcription factors collected in the DPInteract database. In addition, for CRP (a global regulatory factor), the likely regulatory modality (i.e., repressor or activator) of predicted binding sites is determined. [Supplemental material is available online at www.genome.org. The complete list of predicted sites may be found at http://www.biomaps.rutgers.edu/bioinformatics/QPMEME.htm.] Molecular biology has been revolutionized by the availability of some of the candidate regulatory sites.
    [Show full text]