Data Resources and Mining Tools for Reconstructing Gene Regulatory Networks in Lactococcus Lactis
Total Page:16
File Type:pdf, Size:1020Kb
Japanese Journal of Lactic Acid Bacteria Copyright © 2011, Japan Society for Lactic Acid Bacteria Review Data resources and mining tools for reconstructing gene regulatory networks in Lactococcus lactis Anne de Jong1,2,3), Jan Kok1,2,3) and Oscar P. Kuipers1,2,3)* 1) Department of Molecular Genetics, University of Groningen, Groningen Biomolecular Sciences and Biotechnology Institute, Nijenborgh 7, 9747 AG Groningen, the Netherlands 2) Top Institute Food and Nutrition, Wageningen, the Netherlands 3) The Netherlands Kluyver Centre for Genomics of Industrial Fermentations / NCSB, Delft, the Netherlands. Abstract DNA is the blueprint and template for tRNA, rRNA, sRNA and mRNA synthesis in all living organisms. Subsequently, the mRNA is translated to proteins a process in which other RNA types, compounds and proteins play important roles. The regulation of transcription and translation is a delicate equilibrium of thousands of factors interacting within an organism. These factors vary from compound- and protein concentrations, to their activity and stability to physical parameters like temperature and pH. Especially the transcription factors and transcription factor binding sites play a central role in the regulation of gene expression. For lactococci lactis several data sets on gene expression and regulation are available in databases or literature. Furthermore, the number of (in-) complete sequenced lactococci genomes is increasing, while the majority of the strains will only be studied in silico. This review focuses on the visualization and mining of the complex transcription and translation systems via interactive graphics and network reconstruction tools and the amalgamation of DNA microarray data with biological data that together lead to the reconstruction of gene regulatory networks (GRN). These networks are a valuable source of information and knowledge that can be used for studying L. lactis physiology and provide clues for improving industrial strains by either non-GMO methods (strain selection, fermentation condition) or by specific engineering. Key words: DNA microarray, Gene Regulatory Network, Time series, Transcription factor, Transcription factor binding site, Lactococcus lactis Lactococcus lactis products as a rich source for proteins, small peptides, amino acids, fatty acids, calcium and other valuable Since 10.000 years humans consume fermented milk nutrients. The milk sugar lactose is fermented to lactic acid; the consequent acidification prevents food spoilage and is probably the major driving force of *To whom correspondence should be addressed. the success of fermented milk products. Although Phone : +31-50-3632092 nowadays other food preservation methods are applied, Fax : +31-50-3632348 E-mail : [email protected] the acidification of the product is still the major growth : [email protected] inhibitor of bacteria other than Lactococci. The global : [email protected] world market of milk-related products is estimated —3— Vol. 22, No. 1 Japanese Journal of Lactic Acid Bacteria at € 240 billion (IMIS Euromonitor 2006) explaining smaller genes. Prediction of a small ORF (smaller than the high interest of the diary industry in research on 120-150 bases) is still very complicated. In annotation food process management and improvement. To gain processes small ORFs are habitually discarded when more insight in flavour formation, intensive research they do not show homology to known proteins. Of is taking place to unraveling genomes, transcriptomes, course, in this way, novel small genes will never proteomes and (bio-) chemical pathways. The Holy be discovered. BAGEL2 9), a bacteriocin mining Grail would be to build a predictive in silico model of tool, circumvents this issue by taking the genomic each bacterial strain involved in milk fermentation. context into account and is able to discover small Probably this holistic result will not be obtained ORFs encoding novel lantibiotics. The main resources soon, not even for one of the organisms, but recent for annotated genomes, NCBI, EBI and JGI, offer work has already led to models describing various multiple ORF predictions tools made by Glimmer3 and relatively simple processes in the bacterial cell. The PRODIGAL. basis of this research is the availability of genome sequences of lactic acid bacteria but, although many Operon prediction L. lactis genomes have been sequenced, mainly by industrial companies, only a few are available in the An operon is defined as a cluster of genes that public domain (from the strains MG1363, IL1403, SK11 are translated from one mRNA, which is thus called and KF147) at“The Integrated Microbial Genomes a polycistronic messenger, and are under control of (IMG) system”1). In this review we will discuss the the same promoter. To accurately predict operon state of the art bioinformatics tools that facilitate the purely on the basis of genome sequences is still a reconstruction and visualization of gene regulatory challenging task. Using gene direction, gene spacing networks in L. lactis. and transcription terminator prediction programs in combination with comparative genomics, up to 95% Gene prediction accuracy of operon prediction can be achieved 10). Currently, databases provide operons predicted Normally, splicing of mRNA does not occur in in publically accessible genomes. For the de novo prokaryotes, which would suggest that prediction annotation or re-annotation of operons, UNIPOP of a protein encoding gene (open reading frame provides an easy, quick and accurate way 11, 12). (ORF)) is a simple and straight forward process. DOOR 13) (Database of prOkaryotic OpeRons) using the Although prediction methods like GeneMarkHMM 2), algorithm developed by Dam 14)) which is believed to Glimmer 3 3), TriTISA 4), TICO 5), Easygene 6), MED be the most accurate prediction method as described 2.0 7) and PRODIGAL 8) predict up to 90% of the genes in an intensive comparison study 15). MicrobesOnline is correctly in most prokaryotes, still quite a number of a product of the Virtual Institute for Microbial Stress genes are incorrect predicted or missed 8). All methods and Survival; it provides operons for 620 genomes struggle with predicting correct translation start while OperonDB contains operons for 550 genomes, point (AUG, GUG and UUG) and use Shine-Dalgarno ODB 16) for 203 genomes. RegulonDB 17) basically Ribosomal Binding Site (RBS) motifs to improve provides E. coli operons and DBTBS 18) those for B. the prediction of the translation initiation site. An subtilis. While operons in OperonDB 19), Microbes important difference between coding and non-coding Online and ODB 16) are only predicted, operons defined DNA regions is the GC bias and this phenomenon has in RegulonDB and DBTBS are mainly based on been used to improve the ORF prediction. For each laboratory experiments. organism this bias can be calculated and is used to plot reading frame information for detection of the Transcriptomics spurious upstream gene region. Furthermore, modern prediction methods use an ORF length threshold of 90- Bacteria often encounter changes in their 120 bp, because of the high rate of false positives for environment and will adapt the expression of genes —4— Jpn. J. Lactic Acid Bact. Vol. 22, No. 1 and activity of proteins to optimize growth, which Transcription regulators in Lactococcus lactis is of utmost importance to survive in a competitive niche. By sensing the changes in the intracellular In the genome of L. lactis subsp. cremoris MG1363 and/or extracellular environment, the cell can adjust 154 genes are annotated as ncodong TFs. Among the transcription of genes and subsequently change these, 32 have been studied in either L. lactis MG1363 its protein content, to cope with the new situation. or in L. lactis IL1403 and among these, 7 genes encode Transcription of genes (gene expression) is tightly TFs that are members of a two component system regulated and mostly mediated by sigma factors and (Table 1). Another 55 TFs have high homology to other trans-acting transcription factors (transcription regulators in other bacteria with known function regulator proteins; TFs). Furthermore, micro RNA’s (not shown) while 68 are unknown TFs annotated as and T-boxes or S-boxes, can be involved in mediating putative, based on conserved protein motifs involved the level of transcription. The TFs can be activated or in DNA-binding. deactivated when they directly or indirectly“sense” changes in concentration of physical constants (e.g. Regulons temperature), ions or other molecules in the cell. Besides these types of TFs, two component signal Genes or operons that are under control of the same transduction systems (e.g. by phosphorelay) exist TF are member of a regulon. Although prediction that specifically react on external signals 20). DNA methods for regulons have been substantially microarray studies show the expression of all genes Table 1. Transcriptional regulators studied in L. lactis at a defined state of the cells transcriptome. Different MG1363 and/or IL1403 growth rates, stress or other conditions and also the Gene Literature co-expression of genes can be used to build a gene- AhrC MG1363 57, 58) 57, 58) to-gene interactive network including their deduced ArgR MG1363 BusR 59, 60) proteins and subsequent metabolic compounds. The 61) CcpA MG1363 co-regulation of a collection of genes by the same TF CodY MG1363 22, 62) is called a regulon. ComX 63) CopR IL1403 64) CtsR MG1363 65, 66) Transcription factors FhuR IL1403 67) FlpA MG1363 68) 68) Transcription of DNA to mRNA can be activated FlpB MG1363 FruR 69) or repressed when a TF binds to a specific cis- 70, 71) GadR regulatory DNA element (TFBS) in the promoter GntR MG1363 72) 73) region of an operon. The DNA binding region of HdiR MG1363 HisZ 74, 75) TFs ofthen consists of a conserved Helix-Turn-Helix LlrA MG1363 76) secondary structure, a motif that is prevalent in TFs LlrB MG1363 76) of prokaryotes 21). Furthermore, it is common that a LlrC MG1363 76) 77) co-factor molecule is required to activate a TF e.g. LlrD LlrE MG1363 76) branched-chain amino acids (BCAAs) for CodY of L.