Protein-DNA Interactions II

Total Page:16

File Type:pdf, Size:1020Kb

Protein-DNA Interactions II Protein-DNA Interactions II Bio 5488 Overview 1. Review of information content and weight matrices 2. Online PWM resources 3. Motif discovery: Greedy algorithm 4. Motif discovery: Gibbs sampling Information Content EcoR1 Random Rap1 GAATTC GCCTAC TGTATGGGTG GAATTC ACATTC TGTTCGGATT GAATTC TCATTC TGCATGGGTG GAATTC CGACTC TGTACAGGTG GAATTC GAATTC TGTATGGATG GAATTC ATATCG TGTTCGGGTT GAATTC GAAATG TGTATGGGTG Information Content Matrix of Frequencies A: 0.1 0.7 0.2 0.3 0.4 0.1 C: 0.1 0.1 0.1 0.3 0.2 0.1 G: 0.1 0.1 0.2 0.1 0.2 0.1 T: 0.7 0.1 0.5 0.3 0.2 0.7 aka Relative Entropy, Kullbach-Liebler Distance Obtaining a Weight Matrix Hertz and Stormo, Bioinformatics (1999) Online tools: PWMs JASPAR (jaspar.genereg.net) open-access curated, non-redundant, PWMs for multi-cellular eukaryotes hundreds of PWMs JASPAR_FAM - metamodels for structural families Online tools: PWMs AC M00213 XX ID F$RAP1_C XX DT 05.09.1995 (created); dbo. DT 30.11.1995 (updated); ewi. CO Copyright (C), Biobase GmbH. XX NA RAP1 XX DE yeast repressor/activator protein 1 XX BF T00715 RAP1; Species: yeast, Saccharomyces cerevisiae. XX PO A C G T 01 6.89 2.40 0.79 4.74 W 02 8.80 0.00 4.58 1.43 R 03 7.20 6.83 0.79 0.00 M 04 14.82 0.00 0.00 0.00 A TRANSFAC 05 0.00 14.82 0.00 0.00 C 06 0.00 14.82 0.00 0.00 C Free access w/ reduced functionality 07 0.00 14.82 0.00 0.00 C 08 14.82 0.00 0.00 0.00 A to professional version 09 2.13 0.67 2.87 9.15 T 10 11.92 0.76 1.49 0.64 A 11 0.76 14.06 0.00 0.00 C 12 11.45 3.36 0.00 0.00 A Public version dates to 2005 13 0.79 5.64 0.00 7.10 Y 14 1.64 7.18 1.61 4.39 Y XX BA total weight of sequences: 14.82 XX CC consind generated matrix (random_expectation: 0.03) XX // (www.gene-regulation.com/pub/databases.html/#transfac) Online tools: PWMs RegulonDB (regulondb.ccg.unam.mx) Bacterial regulons and operons in E. coli 68 PWMs (as of 2013) ; Background model ; Bernoulli model (order=0) ; Strand undef ; Background pseudo-frequency 0.01 ; Residue probabilities ; a 0.29066 ; c 0.20779 ; g 0.20481 ; t 0.29673 a | 7 5 2 6 8 18 2 9 1 2 1 3 0 2 5 11 4 c | 2 0 3 0 2 1 6 2 3 4 4 3 0 2 1 0 15 g | 5 2 7 0 0 0 5 2 0 2 0 6 15 0 0 3 0 t | 6 13 8 14 10 1 7 7 16 12 15 8 5 16 14 6 1 // a | 0.3 0.3 0.1 0.3 0.4 0.9 0.1 0.4 0.1 0.1 0.1 0.2 0.0 0.1 0.3 0.5 0.2 0.8 c | 0.1 0.0 0.2 0.0 0.1 0.1 0.3 0.1 0.2 0.2 0.2 0.2 0.0 0.1 0.1 0.0 0.7 0.1 g | 0.2 0.1 0.3 0.0 0.0 0.0 0.2 0.1 0.0 0.1 0.0 0.3 0.7 0.0 0.0 0.2 0.0 0.1 t | 0.3 0.6 0.4 0.7 0.5 0.1 0.3 0.3 0.8 0.6 0.7 0.4 0.3 0.8 0.7 0.3 0.1 0.1 // a | 0.2 -0.1 -1.0 0.0 0.3 1.1 -1.0 0.4 -1.6 -1.0 -1.6 -0.6 -3.0 -1.0 -0.1 0.6 -0.4 1.0 c | -0.7 -3.0 -0.3 -3.0 -0.7 -1.3 0.4 -0.7 -0.3 -0.0 -0.0 -0.3 -3.0 -0.7 -1.3 -3.0 1.2 -1.3 g | 0.2 -0.7 0.5 -3.0 -3.0 -3.0 0.2 -0.7 -3.0 -0.7 -3.0 0.4 1.3 -3.0 -3.0 -0.3 -3.0 -1.3 t | 0.0 0.8 0.3 0.8 0.5 -1.6 0.2 0.2 1.0 0.7 0.9 0.3 -0.2 1.0 0.8 0.0 -1.6 -1.0 // a | 0.1 -0.0 -0.1 0.0 0.1 1.0 -0.1 0.2 -0.1 -0.1 -0.1 -0.1 -0.0 -0.1 -0.0 0.3 -0.1 0.8 c | -0.1 -0.0 -0.0 -0.0 -0.1 -0.1 0.1 -0.1 -0.0 -0.0 -0.0 -0.0 -0.0 -0.1 -0.1 -0.0 0.9 -0.1 g | 0.0 -0.1 0.2 -0.0 -0.0 -0.0 0.0 -0.1 -0.0 -0.1 -0.0 0.1 0.9 -0.0 -0.0 -0.0 -0.0 -0.1 t | 0.0 0.5 0.1 0.6 0.2 -0.1 0.1 0.1 0.7 0.4 0.7 0.1 -0.0 0.7 0.6 0.0 -0.1 -0.1 // ; Sites 20 Online tools: Logos WebLogo: (weblogo.berkeley.edu) Creates logos from multiple sequence alignments (not online but useful: seqLogo package for R) Motif Finding Problem • A fundamental problem in molecular biology – Specific protein-DNA binding – Transcription factor binding site recognition • Statistical definition: – Given some sequences, find over-represented substrings (motif discovery) • Biological example: – Given some co-regulated promoters, find transcription factor binding model • Many algorithms/programs developed – consensus, Gibbs sampling, EM, projection, phylogenetic footprinting, etc. Motif Finding Algorithms • Motivation – Motif discovery is a problem whose straight- forward solution is intractable • Algorithms utilize different strategies – Greedy algorithm – Gibbs sampler The Problem Consider a set of sequences that are believed to harbor common subsequences (one per sequence) that are similar but not identical. Find the locations of the subsequences and a compact representation of the alignment of subsequences (e.g, a PWM). The locations should be selected in such a way to maximize the information content of the alignment. An (intractable) solution … k sequences (Exhaustive algorithm) Construct every possible combination of alignments and keep the one with the highest information content. Given a motif of width w, and k sequences of length l, there are L = (l-w+1) possible locations in each sequence, and Lk alignments to check. Real-world case The Data Set: Sequences containing sites for cAMP receptor protein (CRP) locus sequence colel taatgtttgtgctggtTTTTGTGGCATCGGGCGAGAATagcgcgtggtgtgaaagactgtTTTTTTGATCGTTTTCACAAAAatggaagtccacagtcttgacag ecoarabop gacaaaaacgcgtaacAAAAGTGTCTATAATCACGGCAgaaaagtccacattgaTTATTTGCACGGCGTCACACTTtgctatgccatagcatttttatccataag ecobglrl acaaatcccaataacttaattattgggatttgttatatataactttataaattcctaaaattacacaaagttaatAACTGTGAGCATGGTCATATTTttatcaat ecocrp cacaaagcgaaagctatgctaaaacagtcaggatgctacagtaatacattgatgtactgcatGTATGCAAAGGACGTCACATTAccgtgcagtacagttgatagc ecocya acggtgctacacttgtatgtagcgcatctttctttacggtcaatcagcaAGGTGTTAAATTGATCACGTTTtagaccattttttcgtcgtgaaactaaaaaaacc ecodeop agtgaaTTATTTGAACCAGATCGCATTAcagtgatgcaaacttgtaagtagatttccttAATTGTGATGTGTATCGAAGTGtgttgcggagtagatgttagaata ecogale gcgcataaaaaacggctaaattcttgtgtaaacgattccacTAATTTATTCCATGTCACACTTttcgcatctttgttatgctatggttatttcataccataagcc ecoilvbpr gctccggcggggttttttgttatctgcaattcagtacaAAACGTGATCAICCCCTCAATTttccctttgctgaaaaattttccattgtctcccctgtaaagctgt ecolac aacgcaatTAATGTGAGTTAGCTCACTCATtaggcaccccaggctttacactttatgcttccggctcgtatgttgtgtggAATTGTGAGCGGATAACAATTTcac ecomale acattaccgccaaTTCTGTAACAGAGATCACACAAagcgacggtggggcgtaggggcaaggaggatggaaagaggttgccgtataaagaaactagagtccgttta ecomalk ggaggaggcgggaggatgagaacacggcTTCTGTGAACTAAACCGAGGTCatgtaaggaatttcgtgatgttgcttgcaaaaatcgtggcgattttatgtgcgca ecomalt gatcagcgtcgttttaggtgagttgttaataaagatttggAATTGTGACACAGTGCAAATTCagacacataaaaaaacgtcatcgcttgcattagaaaggtttct ecoompa gctgacaaaaaagattaaacataccttatacaagacttttttttcatATGCCTGACGGAGTTCACACTTgtaagttttcaactacgttgtagactttacatcgcc ecotnaa ttttttaaacattaaaattcttacgtaatttataatctttaaaaaaagcatttaatattgctccccgaacGATTGTGATTCGATTCACATTTaaacaatttcaga ecouxul cccatgagagtgaaatTGTTGTGATGTGGTTAACCCAAttagaattcgggattgacatgtcttaccaaaaggtagaacttatacgccatcteatccgatgcaagc pbr-p4 ctggcttaactatgcggcatcagagcagattgtactgagagtgcaccatatgCGGTGTGAAATACCGCACAGATgcgtaaggagaaaataccgcatcaggcgctc trn9cat CTGTGACGGAAGATCACTTCgcagaataaataaatcctggtgtccctgttgataccgggaagccctgggccaacttttggcgaAAATGAGACGTTGATCGGCACG tdc gatttttatactttaacttgttgatatttaaaggtatttaattgtaataacgatactctggaaagtattgaaagttaATTTGTGAGTGGTCGCACATATcctgtt For this case, there are 18 sequences of length 105 bp and we are looking for a motif of width 20 bp. There are 86 different 20 bp subsequences per example and ~7x1034 alignments to check. Stormo and Hartzell, Proc. Natl. Acad. Sci. (1989) Greedy Algorithm: The Idea The exhaustive approach is not possible. A compromise: solve a series of smaller problems in a step-wise exhaustive fashion. The trick is to start with a smaller set of sequences (k = 2) that we can solve exactly, and incorporate the rest of the sequences one by one. Greedy Algorithm The algorithm: 1. From the first sequence, generate scoring matrices for all possible subsequence locations. Initially the counts will be 1 or 0. 2. The next sequence is scanned with all of these matrices and the best scoring location identified. 3. The matrices are updated by folding in information from the newly identified sites in the next sequence. 4. Repeat steps 2 and 3 with the updated matrices and scanning the next sequence until they have all been processed. The details are in the matrix update – you can dump low information content matrices, keep only the highest X by score, etc. Here, all matrices are kept, and updated with the highest scoring site in the sequence under consideration. In case of a tie, keep both. Greedy Algorithm Example Toy problem: 3 sequences of length 7 bp, looking for 6 bp motif Extract all 6 bp subsequences from 1st sequence, Create matrices for each For each matrix, scan second sequence, align to best match, make new matrices Scan 3rd sequence and align to best match. Update matrices. No more sequences, take the best by I.C. Hertz, Hartzell III, Stormo, Bioinformatics (1990) Greedy Algorithm How to think about this? The exhaustive treatment was too expensive, but this algorithm is a step-wise approximation. The first step performs a full scan of each motif alignment on the first sequence against all alignments on the second. As we step-wise incorporate additional sequences, we carry along a finite amount of information (matrices), so the requirements don’t explode. Greedy Algorithm locus sequence colel taatgtttgtgctggtTTTTGTGGCATCGGGCGAGAATagcgcgtggtgtgaaagactgtTTTTTTGATCGTTTTCACAAAAatggaagtccacagtcttgacag ecoarabop gacaaaaacgcgtaacAAAAGTGTCTATAATCACGGCAgaaaagtccacattgaTTATTTGCACGGCGTCACACTTtgctatgccatagcatttttatccataag ecobglrl acaaatcccaataacttaattattgggatttgttatatataactttataaattcctaaaattacacaaagttaatAACTGTGAGCATGGTCATATTTttatcaat ecocrp cacaaagcgaaagctatgctaaaacagtcaggatgctacagtaatacattgatgtactgcatGTATGCAAAGGACGTCACATTAccgtgcagtacagttgatagc ecocya acggtgctacacttgtatgtagcgcatctttctttacggtcaatcagcaAGGTGTTAAATTGATCACGTTTtagaccattttttcgtcgtgaaactaaaaaaacc ecodeop agtgaaTTATTTGAACCAGATCGCATTAcagtgatgcaaacttgtaagtagatttccttAATTGTGATGTGTATCGAAGTGtgttgcggagtagatgttagaata
Recommended publications
  • Escherichia Coli: Protein Families and Binding Sites M
    Functional determinants of transcription factors in Escherichia coli: protein families and binding sites M. Madan Babu and Sarah A. Teichmann DNA-binding transcription factors regulate the expression A reasonable hypothesis would be to assume that of genes near to where they bind. These factors can be activators, repressors and dual regulators are more activators or repressors of transcription, or both. Thus, a closely related to the other proteins within each fundamental question is what determines whether a regulatory group than to proteins in other groups. This transcription factor acts as an activator or repressor? would mean that there would be protein families and Previous research into this question found that a protein's domain architectures that were characteristic either of regulatory function is determined by one or more of the activators, repressors or dual regulators. Prag et al. [10] following factors: protein–proteinprotein–protein contacts, position of the and Perez-Rueda et al. [11] reported evidence to support DNA-binding domain in the protein primary sequence, this by relating the position of helix-turn-helix motifs altered DNA structure, and the position of its binding site along the primary sequence to a protein's regulatory on the DNA relative to the transcription start site. Although function. They used sequence comparison and profile there are many aspects specific to different transcription methods to locate the helix-turn-helix motifs that factors, in this work we demonstrate that, in general, in the represent DBDs, and found that the helix-turn-helix prokaryote Escherichia coli, a transcription factor's protein tends to be at the N terminus for repressors and at the C family is not indicative of its regulatory function, but the terminus for activators.
    [Show full text]
  • Regulondb (Version 6.0): Gene Regulation Model Of
    D120–D124 Nucleic Acids Research, 2008, Vol. 36, Database issue doi:10.1093/nar/gkm994 RegulonDB (version 6.0): gene regulation model of Escherichia coli K-12 beyond transcription, active (experimental) annotated promoters and Textpresso navigation Socorro Gama-Castro1, Vero´ nica Jime´ nez-Jacinto1, Martı´n Peralta-Gil1, Alberto Santos-Zavaleta1,Mo´ nica I. Pen˜ aloza-Spinola1, Bruno Contreras-Moreira1, Juan Segura-Salazar1, Luis Mun˜ iz-Rascado1, Irma Martı´nez-Flores1, Heladia Salgado1,Ce´ sar Bonavides-Martı´nez1, Cei Abreu-Goodger1, Carlos Rodrı´guez-Penagos1, Juan Miranda-Rı´os2, Enrique Morett2, Enrique Merino2, Araceli M. Huerta1, Luis Trevin˜ o-Quintanilla1 and Julio Collado-Vides1,* Downloaded from 1Program of Computational Genomics, Centro de Ciencias Geno´ micas, Universidad Nacional Auto´ noma de Me´ xico, A.P. 565-A, Cuernavaca, Morelos 62100, Mexico and 2Instituto de Biotecnologı´a, Universidad Nacional Auto´ noma de Me´ xico, A.P. 510-3, Cuernavaca, Morelos 62100, Mexico http://nar.oxfordjournals.org/ Received September 14, 2007; Revised October 19, 2007; Accepted October 22, 2007 ABSTRACT with more than 4000 curation notes, can now be RegulonDB (http://regulondb.ccg.unam.mx/) is the searched with the Textpresso text mining engine. primary reference database offering curated knowl- edge of the transcriptional regulatory network at Carleton University on June 21, 2015 of Escherichia coli K12, currently the best-known INTRODUCTION electronically encoded database of the genetic A major current task for bioinformatics is that of repre- regulatory network of any free-living organism. senting biological information into an electronic and This paper summarizes the improvements, new computable form. These computational representations biology and new features available in version 6.0.
    [Show full text]
  • Description Data File Programa De Genómica Computacional / Centro De Ciencias Genómicas
    Description Data File Programa de Genómica Computacional / Centro de Ciencias Genómicas DESCRIPTION DATA FILE 1. GENERAL INFORMATION. Title: LeuO and H-NS Data Set Reference: Shimada T, Bridier A, Briandet R, Ishihama A. Novel roles of LeuO in transcription regulation of E. coli genome: antagonistic interplay with the universal silencer H-NS. Mol Microbiol. 2011 Oct;82(2):378-97. PMID: 21883529 Contact person for this data set: Questions concerning the content of the data set that are raised by users of RegulonDB will be forwarded to this person. We would appreciate receiving a copy of the response to the user, so we can keep track of taking care of user requests. Person: RegulonDB staff Email address: [email protected] 2. DATA SET DESCRIPTION. Summary: LeuO Data Set. LeuO, the regulator of the leucine biosynthesis operon of Escherichia coli, is involved in the regulation of as-yet-unspecified genes that affect the stress response and pathogenesis expression. LeuO is involved in an antagonistic interplay with the universal silencer H-NS. Genomic SELEX screening was performed to identify the whole set of LeuO and H-NS regulation targets. A total of 140 LeuO-binding peaks (183 regulatory interactions) were identified by SELEX methodology, of which as many as 133 (95%) were found to contain the binding site of H-NS. In addition, a DNA microarray assay was carried out to identify genes affected by deletion or overexpression of the leuO gene. A total of 35 regulatory interactions were found via SELEX as well as microarray analysis. Based on the behavior of the regulated genes in the microarray analysis, from which it was determined that LeuO acts as an activator for 18 of them, 13 as a repressor and 4 as a dual function; we deduced that LeuO acts as a dual regulator.
    [Show full text]
  • A Database of Regulatory Networks in Gamma-Proteobacterial Genomes Abel D
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Open Repository and Bibliography - Luxembourg D98–D102 Nucleic Acids Research, 2005, Vol. 33, Database issue doi:10.1093/nar/gki054 TRACTOR_DB: a database of regulatory networks in gamma-proteobacterial genomes Abel D. Gonza´lez, Vladimir Espinosa, Ana T. Vasconcelos1, Ernesto Pe´rez-Rueda2 and Julio Collado-Vides3,* National Bioinformatics Center, Industria y San Jose´, Capitolio Nacional, CP. 10200, Habana Vieja, Habana, Cuba, 1National Laboratory for Scientific Computing, Avenue Getulio Vargas 333, Quitandinha, CEP 25651-075, Petropolis, Rio de Janeiro, Brazil, 2Depto. de Ingenierı`a Celular y Biocata´lisis, IBT-UNAM, Cuernavaca, Morelos, Mexico and 3Center of Genomics, UNAM, AP 565-A Cuernavaca, CP. 62100, Morelos, Mexico Received August 6, 2004; Revised and Accepted October 1, 2004 Downloaded from ABSTRACT out in this direction (1–3). The first step towards this goal is the recognition of all the genes regulated by a transcription factor Experimental data on the Escherichia coli (TF), i.e. its regulon. transcriptional regulatory system has been used in Computational approaches to recognizing the location of http://nar.oxfordjournals.org/ the past years to predict new regulatory elements regulatory sites in bacterial genomes include the use of weight (promoters, transcription factors (TFs), TFs’ binding matrices (4), phylogenetic footprinting (5), searching for sites and operons) within its genome. As more gen- statistical overrepresentation of oligonucleotides within a omes of gamma-proteobacteria are being sequenced, genome and clustering co-expressed genes in order to find the prediction of these elements in a growing number conserved patterns in their upstream regions (6,7), among of organisms has become more feasible, as a others (8,9).
    [Show full text]
  • Synthetic CRISPR-Cas Gene Activators for Transcriptional Reprogramming in Bacteria
    Corrected: Author correction ARTICLE DOI: 10.1038/s41467-018-04901-6 OPEN Synthetic CRISPR-Cas gene activators for transcriptional reprogramming in bacteria Chen Dong1, Jason Fontana2, Anika Patel1, James M. Carothers 2,3,4 & Jesse G. Zalatan 1,2,4 Methods to regulate gene expression programs in bacterial cells are limited by the absence of effective gene activators. To address this challenge, we have developed synthetic bacterial transcriptional activators in E. coli by linking activation domains to programmable CRISPR-Cas 1234567890():,; DNA binding domains. Effective gene activation requires target sites situated in a narrow region just upstream of the transcription start site, in sharp contrast to the relatively flexible target site requirements for gene activation in eukaryotic cells. Together with existing tools for CRISPRi gene repression, these bacterial activators enable programmable control over multiple genes with simultaneous activation and repression. Further, the entire gene expression program can be switched on by inducing expression of the CRISPR-Cas system. This work will provide a foundation for engineering synthetic bacterial cellular devices with applications including diagnostics, therapeutics, and industrial biosynthesis. 1 Department of Chemistry, University of Washington, Seattle, WA 98195, USA. 2 Molecular Engineering & Sciences Institute, University of Washington, Seattle, WA 98195, USA. 3 Department of Chemical Engineering, University of Washington, Seattle, WA 98195, USA. 4 Center for Synthetic Biology, University of Washington, Seattle, WA 98195, USA. Correspondence and requests for materials should be addressed to J.M.C. (email: [email protected]) or to J.G.Z. (email: [email protected]) NATURE COMMUNICATIONS | (2018) 9:2489 | DOI: 10.1038/s41467-018-04901-6 | www.nature.com/naturecommunications 1 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-04901-6 acteria are attractive targets for a wide variety of engi- in bacteria suggests that RpoZ may not be effective as a general Bneering applications.
    [Show full text]
  • A User-Friendly Tool for Improving Bacterial Genome Annotation Through Analysis of Transcription Control Signals
    SigmoID: a user-friendly tool for improving bacterial genome annotation through analysis of transcription control signals Yevgeny Nikolaichik and Aliaksandr U. Damienikan Department of Molecular Biology, Belarusian State University, Minsk, Belarus ABSTRACT The majority of bacterial genome annotations are currently automated and based on a `gene by gene' approach. Regulatory signals and operon structures are rarely taken into account which often results in incomplete and even incorrect gene function assignments. Here we present SigmoID, a cross-platform (OS X, Linux and Windows) open-source application aiming at simplifying the identification of transcription regulatory sites (promoters, transcription factor binding sites and terminators) in bac- terial genomes and providing assistance in correcting annotations in accordance with regulatory information. SigmoID combines a user-friendly graphical interface to well known command line tools with a genome browser for visualising regulatory elements in genomic context. Integrated access to online databases with regulatory information (RegPrecise and RegulonDB) and web-based search engines speeds up genome analysis and simplifies correction of genome annotation. We demonstrate some features of SigmoID by constructing a series of regulatory protein binding site profiles for two groups of bacteria: Soft Rot Enterobacteriaceae (Pectobacterium and Dickeya spp.) and Pseudomonas spp. Furthermore, we inferred over 900 transcription factor binding sites and alternative sigma factor promoters in the annotated genome of Pectobacterium atrosepticum. These regulatory signals control putative transcription units covering about 40% of the P. atrosepticum chromosome. Reviewing the annotation in cases where Submitted 26 January 2016 it didn't fit with regulatory information allowed us to correct product and gene names Accepted 29 April 2016 for over 300 loci.
    [Show full text]
  • Data Resources and Mining Tools for Reconstructing Gene Regulatory Networks in Lactococcus Lactis
    Japanese Journal of Lactic Acid Bacteria Copyright © 2011, Japan Society for Lactic Acid Bacteria Review Data resources and mining tools for reconstructing gene regulatory networks in Lactococcus lactis Anne de Jong1,2,3), Jan Kok1,2,3) and Oscar P. Kuipers1,2,3)* 1) Department of Molecular Genetics, University of Groningen, Groningen Biomolecular Sciences and Biotechnology Institute, Nijenborgh 7, 9747 AG Groningen, the Netherlands 2) Top Institute Food and Nutrition, Wageningen, the Netherlands 3) The Netherlands Kluyver Centre for Genomics of Industrial Fermentations / NCSB, Delft, the Netherlands. Abstract DNA is the blueprint and template for tRNA, rRNA, sRNA and mRNA synthesis in all living organisms. Subsequently, the mRNA is translated to proteins a process in which other RNA types, compounds and proteins play important roles. The regulation of transcription and translation is a delicate equilibrium of thousands of factors interacting within an organism. These factors vary from compound- and protein concentrations, to their activity and stability to physical parameters like temperature and pH. Especially the transcription factors and transcription factor binding sites play a central role in the regulation of gene expression. For lactococci lactis several data sets on gene expression and regulation are available in databases or literature. Furthermore, the number of (in-) complete sequenced lactococci genomes is increasing, while the majority of the strains will only be studied in silico. This review focuses on the visualization and mining of the complex transcription and translation systems via interactive graphics and network reconstruction tools and the amalgamation of DNA microarray data with biological data that together lead to the reconstruction of gene regulatory networks (GRN).
    [Show full text]
  • Nucleic Acids Research, 2004, Vol
    Nucleic Acids Research, 2004, Vol. 32, Database issue D303±D306 DOI: 10.1093/nar/gkh140 RegulonDB (version 4.0): transcriptional regulation, operon organization and growth conditions in Escherichia coli K-12 Heladia Salgado, Socorro Gama-Castro, Agustino MartõÂnez-Antonio, Edgar DõÂaz-Peredo, Fabiola SaÂnchez-Solano, MartõÂn Peralta-Gil, Del®no Garcia-Alonso, Vero nica JimeÂnez-Jacinto, Alberto Santos-Zavaleta, CeÂsar Bonavides-MartõÂnez and Julio Collado-Vides* Program of Computational Genomics, CIFN, UNAM. A.P. 565-A Cuernavaca, Morelos 62100, Mexico Received September 12, 2003; Revised October 8, 2003; Accepted October 29, 2003 ABSTRACT the molecular biology of this cell. It may be that this is the cell for which we know more about the function of its genes, its RegulonDB is the primary database of the major metabolism and transcriptional regulation. This knowledge is international maintained curation of original litera- the foundation for the proposal within the International E.coli ture with experimental knowledge about the Alliance, to achieve in E.coli, as a long-term goal, the ®rst elements and interactions of the network of tran- whole-cell model (1). We contribute to this international effort scriptional regulation in Escherichia coli K-12. This with RegulonDB, the primary database of the major inter- includes mechanistic information about operon national maintained curation of original literature with organization and their decomposition into transcrip- experimental knowledge about the elements and interactions tion units
    [Show full text]
  • Dataset Description
    Promoter_from_454_Dataset.txt Programa de Genómica Computacional / Centro de Ciencias Genómicas DATASET DESCRIPTION DATASET: Promoter_from_454_Dataset.txt. Experimental determination of Transcription Star Sites (TSSs) with High Throughput Pyrosequencing Strategy (HTPS) and computational promoter prediction. Contact person for this dataset: Person: RegulonDB team Email address: [email protected] Type of dataset: Experimental TSS mapping and computational promoter prediction Reference: Alfredo Mendoza, Leticia Olvera, Maricela Olvera, Ricardo Grande, Veronica Jiménez-Jacinto, Blanca Taboada, Leticia Vega, Katy Juárez, Heladia Salgado, Araceli Huerta, Julio Collado-Vides and Enrique Morett. (2009). Genome-wide Identification of Transcription Start Sites, Promoters and Transcription Factor Binding Sites in E. coli . PLoS ONE. 4(10):e7526. Description: This file contains: High Throughput Pyrosequencing Strategy (HTPS) Data, with and without previously know TSSs. For many, promoter region was predicted in this work. Summary: This file has a collection of more than 1491 transcription start sites (TSSs) that have been experimentally determined with high precision in Escherichia coli using an unbiased High Throughput Pyrosequencing Strategy (HTPS). From this collection, about 148 corresponded to previously reported TSSs, which helped us to benchmark both our methodologies and the accuracy of the previous mapping experiments. The other ca 1300 TSSs mapped belong to about 900 different genes, many of them with no assigned function. Página 1 de 4 Promoter_from_454_Dataset.txt Programa de Genómica Computacional / Centro de Ciencias Genómicas Methods Version of programs: We use blast version BLASTN 2.2.18 For the σ38 -10 element a new matrix was constructed with the WCONSENSUS program (version 6d), utilizing the nucleotide sequences of 71 promoters of E.
    [Show full text]
  • Investigating Binding Site Ratios of Transcription Factors Center for Advanced Biotechnology and Medicine, Stock Lab Zeyue Li Rutgers University
    Investigating Binding Site Ratios of Transcription Factors Center for Advanced Biotechnology and Medicine, Stock Lab Zeyue Li Rutgers University Abstract Results Bacterial signal transduction is controlled by transcription factors (TFs), which bind to specific binding sites on bacterial chromosomes.3 Expression levels of TFs are often presumed to be in Protein Abundance Data from PaxDB TFBS Identified by High Throughput Methods great excess to their binding sites, leading to high ratios of TF concentrations to the numbers of TF binding sites. Previous studies have suggested a median ratio of 10.4 Here we used the updated binding data from regulonDB and multiple protein abundance databases to reexamine the ratios for E. coli TFs. We discovered that TFs are unlikely to be in great excess to their binding sites. Further, we believe that the binding site data from regulonDB is biased toward the intergenic promoter regions and there are more binding sites that have not been discovered, especially those within open reading frames. To better estimate the number of binding sites, we took a bioinformatics direction, using multiple sources, including high-throughput data, gSELEX data, and computational predictions using FIMO scan of the bacterial genome. Although different sources vary in their reports of the number of binding sites, all indicated at least two fold higher numbers of binding sites than those from regulonDB. Overall, we found that repressors tend to have a higher ratio compared to activators and dual function TFs, and Figure 1: The median ratio of TF concentration (PaxDB) to TFBS is 7.23 we found that the originally reported ratio is higher than what studies with newer technology Figure 7: Box plot of the ratio of high throughput data to regulonDB data list.
    [Show full text]
  • Determination and Dissection of DNA-Binding Specificity for The
    International Journal of Molecular Sciences Article Determination and Dissection of DNA-Binding Specificity for the Thermus thermophilus HB8 Transcriptional Regulator TTHB099 Kristi Moncja and Michael W. Van Dyke * Department of Chemistry and Biochemistry, Kennesaw State University, Kennesaw, GA 30144, USA; [email protected] * Correspondence: [email protected]; Tel.: +1-470-578-2793 Received: 14 October 2020; Accepted: 24 October 2020; Published: 26 October 2020 Abstract: Transcription factors (TFs) have been extensively researched in certain well-studied organisms, but far less so in others. Following the whole-genome sequencing of a new organism, TFs are typically identified through their homology with related proteins in other organisms. However, recent findings demonstrate that structurally similar TFs from distantly related bacteria are not usually evolutionary orthologs. Here we explore TTHB099, a cAMP receptor protein (CRP)-family TF from the extremophile Thermus thermophilus HB8. Using the in vitro iterative selection method Restriction Endonuclease Protection, Selection and Amplification (REPSA), we identified the preferred DNA-binding motif for TTHB099, 50–TGT(A/g)NBSYRSVN(T/c)ACA–30, and mapped potential binding sites and regulated genes within the T. thermophilus HB8 genome. Comparisons with expression profile data in TTHB099-deficient and wild type strains suggested that, unlike E. coli CRP (CRPEc), TTHB099 does not have a simple regulatory mechanism. However, we hypothesize that TTHB099 can be a dual-regulator similar to CRPEc. Keywords: bioinformatics; biolayer interferometry (BLI); electrophoretic mobility shift assay (EMSA); extremophile; protein-DNA binding; type IIS restriction endonuclease 1. Introduction Transcription factors (TFs) are DNA-binding proteins that allow for modulation of transcription initiation in response to intracellular and extracellular changes.
    [Show full text]
  • A Biophysical Approach to Transcription Factor Binding Site Discovery Marko Djordjevic,1 Anirvan M
    Downloaded from genome.cshlp.org on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press Article A Biophysical Approach to Transcription Factor Binding Site Discovery Marko Djordjevic,1 Anirvan M. Sengupta,2 and Boris I. Shraiman2,3 1Department of Physics, Columbia University, New York, New York 10025, USA; 2Department of Physics and BioMaPSInstitute, Rutgers University, Piscataway, New Jersey 08854, USA Identification of transcription factor binding sites within regulatory segments of genomic DNA is an important step toward understanding of the regulatory circuits that control expression of genes. Here, we describe a novel bioinformatics method that bases classification of potential binding sites explicitly on the estimate of sequence-specific binding energy of a given transcription factor. The method also estimates the chemical potential of the factor that defines the threshold of binding. In contrast with the widely used information-theoretic weight matrix method, the new approach correctly describes saturation in the transcription factor/DNA binding probability. This results in a significant improvement in the number of expected false positives, particularly in the ubiquitous case of low-specificity factors. In the strong binding limit, the algorithm is related to the “support vector machine” approach to pattern recognition. The new method is used to identify likely genomic binding sites for the E. coli transcription factors collected in the DPInteract database. In addition, for CRP (a global regulatory factor), the likely regulatory modality (i.e., repressor or activator) of predicted binding sites is determined. [Supplemental material is available online at www.genome.org. The complete list of predicted sites may be found at http://www.biomaps.rutgers.edu/bioinformatics/QPMEME.htm.] Molecular biology has been revolutionized by the availability of some of the candidate regulatory sites.
    [Show full text]