Proteogenomic Analysis and Global Discovery of Posttranslational
Total Page:16
File Type:pdf, Size:1020Kb
Proteogenomic analysis and global discovery of PNAS PLUS posttranslational modifications in prokaryotes Ming-kun Yanga,1, Yao-hua Yanga,1, Zhuo Chena, Jia Zhanga, Yan Lina, Yan Wanga, Qian Xionga, Tao Lia,2, Feng Gea,2, Donald A. Bryantb, and Jin-dong Zhaoa aKey Laboratory of Algal Biology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China; and bDepartment of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802 Edited* by Jiayang Li, Chinese Academy of Sciences, Beijing, China, and approved November 13, 2014 (received for review July 6, 2014) We describe an integrated workflow for proteogenomic analysis and cyanobacterial cells adjust their cellular activities in response and global profiling of posttranslational modifications (PTMs) in to a wide range of environmental cues and stimuli. Recently, prokaryotes and use the model cyanobacterium Synechococcus sp. cyanobacteria have attracted great interest due to their crucial PCC 7002 (hereafter Synechococcus 7002) as a test case. We found roles in global carbon and nitrogen cycles and their ability to more than 20 different kinds of PTMs, and a holistic view of PTM produce clean and renewable biofuels such as hydrogen (14–16). events in this organism grown under different conditions was Synechococcus 7002 is a unicellular, marine cyanobacterium and obtained without specific enrichment strategies. Among 3,186 pre- a model organism for studying photosynthetic carbon fixation dicted protein-coding genes, 2,938 gene products (>92%) were and the development of biofuels (17, 18). However, whereas the identified. We also identified 118 previously unidentified proteins genome of Synechococcus 7002 is fully sequenced, it is annotated and corrected 38 predicted gene-coding regions in the Synecho- only by in silico methods (www.ncbi.nlm.nih.gov/), with a large coccus 7002 genome. This systematic analysis not only provides portion (1,210 out of 3,186) of protein-coding genes annotated as comprehensive information on protein profiles and the diversity hypothetical proteins (17). Therefore, a comprehensive analysis of PTMs in Synechococcus 7002 but also provides some insights is needed to provide experimental support for the genome an- into photosynthetic pathways in cyanobacteria. The entire proteo- notation so as to facilitate systems-level analysis. Using our method, genomics pipeline is applicable to any sequenced prokaryotic we performed the validation of the predicted protein-coding genes, organism, and we suggest that it should become a standard part identified previously unidentified genes, and corrected gene initia- of genome annotation projects. tion and stop-codon positions in Synechococcus 7002, and di- rectional RNA-Seq was used to determine the existence of a proteogenomics | post-translational modifications | cyanobacteria | number of previously unidentified genes identified in this study. photosynthesis | Synechococcus Significance roteogenomics refers to the correlation of mass spectrome- SCIENCES Ptry-derived proteomic data to refine genome annotation (1) Proteogenomics is the application of mass spectrometry- and has been applied to the identification of previously un- derived proteomic data for testing and refining predicted genetic APPLIED BIOLOGICAL identified genes and the correction and validation of predicted models. Cyanobacteria, the only prokaryotes capable of oxy- genes in various organisms (2–4). It is an important tool for in- genic photosynthesis, are the ancestor of chloroplasts in plants tegrating protein-level information into the genome annotation and play crucial roles in global carbon and nitrogen cycles. An process and can greatly improve genome annotation quality. The integrated proteogenomic workflow was developed, and we same experimental proteomic datasets are also useful in identi- tested this system on a model cyanobacterium, Synechococcus fying posttranslational modifications (PTMs) on a proteome- 7002, grown under various conditions. We obtained a nearly wide level (5, 6). Many cellular proteins undergo appreciable complete genome translational profile of this model organism. amounts of PTM in response to certain stimuli, and this dynamic In addition, a holistic view of posttranslational modification process occurs in various cell compartments to dictate the fate (PTM) events is provided using the same dataset, and the and activity of the modified proteins (7). Identification and results provide insights into photosynthesis. The entire pro- mapping of PTMs in proteins have been improved dramatically, teogenomics pipeline is applicable to any sequenced prokar- mainly due to increases in the sensitivity, speed, accuracy, and yotes and could be applied as a standard part of genome resolution of mass spectrometry (MS). However, system-wide annotation projects. identification of multiple PTMs remains a highly challenging task, especially in situations where some reversible PTMs are Author contributions: T.L., F.G., and J.-d.Z. designed research; M.-k.Y., Y.-h.Y., Z.C., Y.L., induced by a particular stimulus and are present for only a short Y.W., T.L., and F.G. performed research; D.A.B. contributed new reagents/analytic tools; period (8). To the best of our knowledge, very few reports of M.-k.Y., Y.-h.Y., Z.C., J.Z., Y.L., Q.X., T.L., F.G., and J.-d.Z. analyzed data; and M.-k.Y., Z.C., T.L., F.G., D.A.B., and J.-d.Z. wrote the paper. proteogenomic datasets have presently been used to analyze PTM events comprehensively in a genome sequence (9, 10). The authors declare no conflict of interest. In this study, we developed a proteogenomic approach to carry *This Direct Submission article had a prearranged editor. out genome annotation and whole-proteome analysis of PTMs in Freely available online through the PNAS open access option. prokaryotes by using high resolution and high accuracy MS data Data deposition: MS raw data files, processed data files, annotated peptide spectral matches for all accepted single peptide hits, annotated peptide spectrum matches of and the cyanobacterium Synechococcus sp. PCC 7002 (hereafter identified peptides with stop codon read-through, annotated spectra of all identified Synechococcus 7002) as a test case. Cyanobacteria are a mor- peptides with specific PTMs, snapshots of genome browser for all the previously uniden- phologically diverse group of Gram-negative bacteria and are the tified genes along with the corrections to the existing gene models identified in this study, Tables S1–S9, RNA-Seq data, and in-house integrated proteogenomic analysis soft- only prokaryotes capable of oxygenic photosynthesis (11). It is ware have been deposited in the PeptideAtlas repository (www.peptideatlas.org) (acces- estimated that more than half of the photosynthetic activity on sion no. PASS00285). Earth is contributed by cyanobacteria (12). Cyanobacteria make 1M.-k.Y. and Y.-h.Y. contributed equally to this work. substantial contributions to global CO2 assimilation, O2 pro- 2To whom correspondence may be addressed. Email: [email protected] or [email protected]. duction, and N2 fixation and are the progenitors of chloroplasts This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. in higher plants (13). Cyanobacterial habitats are highly diverse, 1073/pnas.1412722111/-/DCSupplemental. www.pnas.org/cgi/doi/10.1073/pnas.1412722111 PNAS | Published online December 15, 2014 | E5633–E5642 Downloaded by guest on September 25, 2021 Fig. 1. Experimental and bioinformatic workflow of the proteogenomic analysis. Protein extracts were prepared from Synechococcus 7002 cultures grown to exponential phase (flask 1) and stationary phase (flask 2) and exposed to stresses, including iron de- ficiency (flask 3), phosphate deficiency (flask 4), ni- trogen deficiency (flask 5), A5 deficiency (flask 6), high light (flask 7), and high salt concentration (flask 8) (2.5 M). After protein extraction, proteins are subjected to Glu-C and Trypsin digestion, producing peptide mixture. The mixture was analyzed by means of nano-LC-MS/MS with an LTQ-Orbitrap Elite mass spectrometer. MS/MS peptide spectra were searched against specific organism genome sequences, vali- dating and correcting genomic annotations, as well as identifying previously unidentified protein-cod- ing genes and diverse PTMs. More importantly, we characterized PTM features on a pro- identified from the two fractionation methods applied for ana- teome-wide level using the same experimental proteomic data- lyzing the cell-lysate proteins. Sequest, Mascot, MaxQuant, pFind, sets and identified many different PTM types that may play and X!Tandem searches identified 238,918; 239,779; 252,696; important roles in cellular functions. Our proteogenomic data 241,601; and 268,763 peptides, respectively, resulting in the iden- provided significant information for revising the genome anno- tification of 55,862 unique peptides. tation of Synechococcus 7002 and offered insights into the physiology of this model organism. The method and approach Protein Expression Analysis of Synechococcus. The CyanoBase da- can also be used to study genome annotation and cellular protein tabase (genome.microbedb.jp/cyanobase/) lists 3,186 protein-coding PTMs in other organisms. genes in the Synechococcus 7002 genome (3.41 Mb, released 2012). Unique peptides were mapped onto the genome-translated da- Results tabase, and proteins identified by at least two unique peptides or Proteogenomic Strategy for the Analysis of Synechococcus