Comparison of Protein and Mrna Profiles of Escherichia Coli: Data Visualization and Analysis of Specific Gene Groups

COMPARISON OF PROTEIN AND MRNA PROFILES OF ESCHERICHIA COLI: DATA VISUALIZATION AND ANALYSIS OF SPECIFIC GENE GROUPS Oleg Paliy,1,2 Brian Thomas,3 Rebecca Corbin,4 Feng Yang,4 Jeffrey Shabnowitz, 4 Mark Platt, 4 Charles E. Lyons, Jr., 4 Karen Root, 4 Donald Hunt, 4, 5 and Sydney Kustu2 Department of Biochemistry and Molecular Biology, Wright State University, Dayton, Ohio, 45435,1 Department of Plant & Microbial Biology,2 and College of Natural Resources,3 University of California, Berkeley, CA 94720, Department of Chemistry, University of Virginia, Charlottesville, VA 22901,4 and Department of Pathology, University of Virginia, Charlottesville, VA 2290845. ABSTRACT estimates of mRNA abundance and which genes are Despite recent progress in protein identification transcribed under a single condition [1-4]. Analysis of in whole cell lysates, many laboratories interested in complex mixtures of tryptic peptides by mass spectrometry global gene expression depend on assessment of mRNA provides a powerful method for determining the protein rather than protein. Hence knowledge of the composition of cells [5-7]. Availability of both types of relationship between the two remains important. data has allowed comparisons between them and several Though it has been explored in eukaryotic cells, there such comparisons have been made for the yeast are few studies of this relationship in bacteria. In Saccharomyces cerevisiae [8-10]. addition, previous studies have generally not considered We recently made general comparisons between illustrative examples. proteins and mRNAs detected in E. coli strain MG1655 We previously detected with high reliability (CGSC 6300) grown in minimal medium with glycerol as about one quarter of the proteins of E. coli (1147 carbon source [11]. Globally there appeared to be a positive proteins) in cells grown under a single condition in relationship between protein detection, performed under minimal medium and compared these proteins to global conditions that favored proteins of greatest abundance, and mRNA levels. To understand the relationship between mRNA levels. Here we extend our earlier studies by protein detection and mRNA abundance in greater presenting a simple visualization approach to facilitate depth we here consider it for specific gene groups comparisons between protein and mRNA data and consider (translation apparatus, energy metabolism, motility and examples of specific genes and operons for which the chemotaxis, cofactor biosynthesis, transcriptional biological literature allows more meaningful analysis. regulators, and membrane and membrane associated proteins). We also present a data visualization tool that RESULTS facilitates comparison of whole cell mRNA and protein Data visualization profiles. In most instances protein detection was We created a simple visualization tool that allowed associated with a high level of mRNA, as well as with us to display protein and mRNA presence calls in genome greater protein length and solubility. We failed to detect order (Fig. 1, http://coli.berkeley.edu/protein_profile/), as cognate mRNA for only 34 of the proteins we identified. we did previously for DNA microarray data [12-15]. The resulting genome image facilitates analysis of the data, INTRODUCTION particularly in terms of operon organization. This tool Physiological studies of organisms whose genome allows displaying the protein and mRNA detection either as sequences have been determined have been greatly separate squares, each corresponding to a detected protein advanced by development of new techniques to sample cell or mRNA, or as a vertical rectangle for cases where both composition globally at mRNA, protein, and metabolite protein and mRNA were detected for a particular gene. The levels. Availability of global data has led to introduction of image map on the web site allows the gene ID number [16], the term “systems biology”. Affymetrix GeneChip gene name, and gene description to be displayed above the microarrays allow the comparison of mRNA levels under image. Clicking on a spot of interest transfers the user to different growth conditions and also provide statistical the E. coli Entry Point database (http://coli.berkeley.edu/ 1 O.Paliy et al. Figure 1 - Genome image of protein and mRNA presence calls for E. coli MG1655 grown in minimal medium with glycerol as carbon source and NH4Cl as nitrogen source. Genes are arranged in their order on the chromosome of E. coli (according to the original E. coli annotation [16]) beginning with Blattner (b)0001 and progressing from left to right. There are 100 genes / row. To assist in viewing, each 10 genes are marked with a tick and a narrow vertical line and the background for the rows alternates between light and dark gray. Green bars indicate genes for which protein and mRNA were both detected. Yellow squares indicate the 34 genes for which protein but not mRNA was detected (see text), whereas blue squares indicate genes for which RNA but not protein was detected. Boxes correspond to some of the examples given in Table 1 and discussed in the text. Red boxes denote operons or clusters of operons with relatively abundant protein products. They are (in b number order): the trp operon (b1260-01265; b1265 is the trp leader); the his operon (b2018-2026; b2018 is the his leader); the nuo operon (b2276-2288), a cluster of ribosomal protein operons (b3294-3321), and the atp operon (b3731-3739). White boxes denote operons or clusters of operons with less abundant protein products. They are (in b number order): a cluster of murein-fts operons (b0081-0095); the lac operon and lac regulatory gene (b0342-0345; b0345 is lacI); and a cluster of flagellar (fli) operons (b1937-1950). At our web site (Protein data display: http://coli.berkeley.edu/protein_profile/), a cursor can be used to deter- mine the b number, name, and description of each gene in the image. Links to the E. coli Entry Point (http://coli.berkeley.edu/ecoli/) facilitate obtaining additional information. cgi-bin/ecoli/coli_entry.pl) [14], where useful information [b3462], and alr [b4053]) are known to be required for about the gene can be retrieved easily. In our comparison, murein synthesis and cell division (see below) and one for cognate mRNAs were not detected (called “absent” by the fatty acid biosynthesis (product of fabF [b1095]) [16]. We Affymetrix algorithm) for only 34 proteins out of the total happen to know independently that the GlnG regulatory list of 1147 proteins (yellow boxes, Fig. 1) because none of protein (=NtrC; product of b3868) is also present [12, 17, the genes for these proteins had a high mRNA signal on the 18]. Expression of the genes for these five proteins should array (5-650; average for the transcriptome was 2000). have been detected at the mRNA level. We estimate that the Three of the proteins (products of mcrB [b0149], ftsX limit for protein detection in our experiments was at 50-100 2 Comparison of protein and mRNA profiles of Escherichia coli: data visualization and analysis of specific gene groups protein copies per cell, whereas Affymetrix microarrays can flagellar gene [25], and there is direct evidence that its detect 1 molecule of RNA in a complex mixture of 100,000 product is a periplasmic binding protein for cystine [26, distinct RNA molecules [11, 19]. Another 9 proteins whose 27]. Products of the lactose utilization operon were not cognate mRNAs were “absent” were designated ORFs. detected and mRNA was detected (unreliably) only for lacZ (Table 1 legend). Of 160 known DNA-binding Gene examples transcriptional regulators [28], we detected only 37 (23%), To make our understanding of the protein profile whereas 124 were considered expressed at the mRNA level. concrete, we looked at a number of specific examples The average mRNA signal intensity for genes (Table 1, Fig. 1; see also supplementary material). corresponding to regulators detected at the protein level Abundant proteins was eight-fold higher than that for genes corresponding to undetected regulators. We began with abundant proteins we expected to We considered examples of proteins utilized for find on the protein list: ribosomal proteins, enzymes of synthesis of co-factors that are required for growth in glycerol and central carbon metabolism, and amino acid minimal medium because we thought their expression biosynthetic enzymes. Ribosomal proteins, which are levels might not be high. All of the genes whose products among the most abundant in the cell (~15,000 copies under are thought to be required for synthesis of NAD, our growth conditions [20]), were well-represented in the pyridoxine, riboflavin, thiamine, and biotin (35 total) ([29] total list of proteins. Of the 55 ribosomal proteins, 49 were and J. Cronan, personal communication) were expressed at identified and mRNA was detected for all 55 genes. The the mRNA level, and half of their protein products were ribosomal proteins that were missed had very few predicted present in the total list. The average mRNA signal tryptic peptides (1 to 4, whereas the average was 6.0). Nine intensities for these five groups of genes were between of the 12 proteins involved in glycerol catabolism were 1300 and 2900 (Table 1). It is known that some of the detected, including all of those known to be required for enzymes involved in co-factor synthesis have low turnover growth, and expression of all 12 genes was detected at the numbers (J. Cronan, personal communication), and hence mRNA level. The glycerol facilitator (GlpF) and glycerol both transcripts and proteins may be more abundant than phosphate permease (GlpT) were detected only in the anticipated. membrane sample [11]. Most proteins categorized as Finally, we considered a group of proteins whose glycolytic (14 out of 18), gluconeogenic (all 4) or as expression levels were expected to vary widely within the components of the tricarboxylic acid cycle (15 out of 17) group – proteins required for cell division (fts gene [21] were detected and their mean mRNA signal intensities, products).

Comparison of Protein and Mrna Profiles of Escherichia Coli: Data Visualization and Analysis of Specific Gene Groups

Table S1. List of Proteins in the BAHD1 Interactome

Inactive USP14 and Inactive UCHL5 Cause Accumulation of Distinct Ubiquitinated Proteins In

Proteomic and Bioinformatic Pipeline to Screen the Ligands of S

Identification and the Significance of Selective Proteins in Bile And

Supplementary Table 1. the List of Proteins with at Least 2 Unique

Cross-Talk Between Overlap Interactions in Biomolecules: a Case Study of the Β-Turn Motif

Biological Recognition of Graphene Nanoflakes

USE of CLUSTERING TECHNIQUES for PROTEIN DOMAIN ANALYSIS Eric Rodene University of Nebraska - Lincoln, [email protected]

Biochemical Aspects of Seeds from Cannabis Sativa L. Plants Grown In

Viewed in (104)]

Scholarworks @ UVM

Fractional Precipitation of Plasma Proteome by Ammonium Sulphate