Article (Submitted Version)
Total Page:16
File Type:pdf, Size:1020Kb
Article Metrics for the Human Proteome Project 2016: Progress on Identifying and Characterizing the Human Proteome, Including Post-Translational Modifications OMENN, Gilbert S, et al. Abstract The HUPO Human Proteome Project (HPP) has two overall goals: (1) stepwise completion of the protein parts list, the draft human proteome, confidently identifying and characterizing at least one protein product from each protein-coding gene, with increasing emphasis on the sequence variants, post-translational modifications, and splice isoforms of those proteins, and (2) making proteomics an integrated counterpart to genomics throughout the biomedical and life sciences community. PeptideAtlas and GPMDB reanalyze all major mass spectrometry datasets available through ProteomeXchange with standardized protocols and stringent quality filters; neXtProt curates and integrates mass spectrometry and other findings. The HPP Guidelines for Mass Spectrometry Data Interpretation version 2.0 were applied to manuscripts submitted for this 2016 C-HPP-led special issue [www.thehpp.org/guidelines]. The Human Proteome presented as neXtProt version 2016-02 has 16,518 confident protein identifications (Protein Existence [PE] Level 1), up from 13,664 at 2012-12, 15,646 at 2013-09, and 16,491 at 2014-10. There are 485 proteins that would have [...] Reference OMENN, Gilbert S, et al. Metrics for the Human Proteome Project 2016: Progress on Identifying and Characterizing the Human Proteome, Including Post-Translational Modifications. Journal of Proteome Research, 2016, vol. 15, no. 11, p. 3951-3960 DOI : 10.1021/acs.jproteome.6b00511 PMID : 27487407 Available at: http://archive-ouverte.unige.ch/unige:86187 Disclaimer: layout of this document may differ from the published version. 1 / 1 Subscriber access provided by UNIVERSITE DE GENEVE Perspective Metrics for the Human Proteome Project 2016: Progress on Identifying and Characterizing the Human Proteome, Including Post-Translational Modifications Gilbert S Omenn, Lydie Lane, Emma K. Lundberg, Ronald C Beavis, Christopher M Overall, and Eric W. Deutsch J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.6b00511 • Publication Date (Web): 03 Aug 2016 Downloaded from http://pubs.acs.org on August 11, 2016 Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts. Journal of Proteome Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties. Page 1 of 39 Journal of Proteome Research 1 2 3 4 5 6 7 8 Metrics for the Human Proteome Project 2016: 9 10 11 12 Progress on Identifying and Characterizing the 13 14 15 16 Human Proteome, Including Post-Translational 17 18 19 20 21 Modifications 22 23 24 25 Gilbert S. Omenn,* × Lydie Lane, ∞ Emma K. Lundberg, µ Ronald C. Beavis, α 26 27 φ π 28 Christopher M. Overall , and Eric W. Deutsch 29 30 31 × Department of Computational Medicine and Bioinformatics, University of Michigan, 100 32 33 34 Washtenaw Avenue, Ann Arbor, Michigan 48109-2218, United States 35 36 37 ∞ CALIPHO Group, SIB Swiss Institute of Bioinformatics and Department of Human Protein 38 39 Science, University of Geneva, CMU, Michel-Servet 1, 1211 Geneva 4, Switzerland 40 41 42 µ 43 SciLifeLab Stockholm and School of Biotechnology, KTH, Karolinska Institutet Science Park, 44 45 Tomtebodavägen 23, SE-171 65 Solna, Sweden 46 47 48 α 49 Biochemistry & Medical Genetics, University of Manitoba, Winnipeg, MB, Canada R3T 2N2 50 51 52 φ Overall Lab, University of British Columbia, 2350 Health Sciences Mall, Room 4.420, 53 54 55 Vancouver, BC Canada V6T 1Z3 56 57 58 59 60 ACS Paragon Plus Environment 1 Journal of Proteome Research Page 2 of 39 1 2 3 π Institute for Systems Biology, 401 Terry Avenue North, Seattle, Washington 98109-5263, 4 5 6 United States 7 8 9 KEYWORDS. Metrics, guidelines, neXtProt, PeptideAtlas, GPMDB, Human Protein Atlas, 10 11 12 PTMs (post-translational modifications), N-termini, SAAV (single amino acid variants), splice 13 14 isoforms. 15 16 17 18 19 20 21 ABSTRACT 22 23 24 25 The HUPO Human Proteome Project (HPP) has two overall goals: (1) stepwise completion of 26 27 the protein parts list, the draft human proteome, confidently identifying and characterizing at 28 29 30 least one protein product from each protein-coding gene, with increasing emphasis on the 31 32 sequence variants, post-translational modifications, and splice isoforms of those proteins, and (2) 33 34 making proteomics an integrated counterpart to genomics throughout the biomedical and life 35 36 37 sciences community. 38 39 40 41 PeptideAtlas and GPMDB reanalyze all major mass spectrometry datasets available through 42 43 ProteomeXchange with standardized protocols and stringent quality filters; neXtProt curates and 44 45 integrates mass spectrometry and other findings. The HPP Guidelines for Mass Spectrometry 46 47 48 Data Interpretation version 2.0 were applied to manuscripts submitted for this 2016 C-HPP-led 49 50 special issue [www.thehpp.org/guidelines]. 51 52 53 54 The Human Proteome presented as neXtProt version 2016-02 has 16,518 confident protein 55 56 identifications (Protein Existence [PE] Level 1), up from 13,664 at 2012-12, 15,646 at 2013-09, 57 58 59 60 ACS Paragon Plus Environment 2 Page 3 of 39 Journal of Proteome Research 1 2 3 and 16,491 at 2014-10. There are 485 proteins that would have been PE1 under the Guidelines 4 5 6 v1.0 from 2012, but now have insufficient evidence due to the agreed-upon more stringent 7 8 Guidelines v2.0 to reduce false-positives. neXtProt and PeptideAtlas now both require two non- 9 10 11 nested, uniquely-mapping (proteotypic) peptides of at least 9 aa in length. There are 2949 12 th 13 missing proteins (PE2+3+4) as the baseline for submissions for the 4 annual C-HPP special 14 15 issue of Journal of Proteome Research. PeptideAtlas has 14,629 canonical (plus 1187 uncertain 16 17 18 and 1755 redundant) entries. GPMdb has 16,190 EC4 entries, and the Human Protein Atlas has 19 20 10,475 entries with supportive evidence. neXtProt, PeptideAtlas, and GPMDB are rich resources 21 22 of information about PTMs, SAAVs, and splice isoforms. Meanwhile, the Biology and Disease- 23 24 25 driven B/D-HPP has created comprehensive SRM resources, generated popular-protein lists to 26 27 guide targeted proteomics assays for specific diseases, and launched an Early Career Researchers 28 29 initiative. 30 31 32 33 INTRODUCTION 34 35 36 37 The HUPO (www.hupo.org) Human Proteome Project (HPP) (www.thehpp.org) is pursuing 38 1 39 two overall goals : (1) stepwise completion of the protein parts list, the draft human proteome, 40 41 confidently identifying and characterizing at least one protein product from each protein-coding 42 43 44 gene, with increasing emphasis on the sequence variants, post-translational modifications, and 45 46 splice isoforms of those proteins; and (2) making proteomics an increasingly integrated 47 48 component of multi-omics analyses throughout the biomedical and life sciences community 49 50 51 through advances in assays, instruments, and knowledgebases for identification, quantitation, and 52 53 functional assessment of proteins and proteoforms in diverse biological systems. There are 50 54 55 56 HPP research teams worldwide organized as the Chromosome-centric C-HPP, the Biology and 57 58 59 60 ACS Paragon Plus Environment 3 Journal of Proteome Research Page 4 of 39 1 2 3 Disease-driven B/D-HPP, and the Affinity-Based Protein Capture, Mass Spectrometry, and 4 5 6 Knowledgebase (Bioinformatics) Resource Pillars. This article is part of the fourth annual C- 7 8 HPP-led special issue of the Journal of Proteome Research 2, 3, 4 . 9 10 11 Periodic updates of PeptideAtlas (www.peptideatlas.org) and neXtProt (www.neXtProt.org) 12 13 are organized in an approximately annual cycle tied to the pipeline for submission, review, and 14 15 publication of the C-HPP-led special issue of JPR in time for the HUPO World Congress in 16 17 18 September. This year the reference versions of these databases are PeptideAtlas 2016-01 and 19 20 neXtProt 2016-02 (released 2016-01-11). Specific Guidelines v2.0 were generated, discussed, 21 22 and approved at the HUPO2015 World Congress in Vancouver and released together with the 23 24 25 Call for Papers for this C-HPP special issue in November 2015.