Completing the C-HPP MP50 Challenge for 17 (2016-2019)

Hongjiu Zhang, PhD, Bioinformatics Student Gilbert S. Omenn, MD, PhD Harold T. Shapiro Distinguished University Professor Center for Computational Medicine and Bioinformatics University of Michigan, Ann Arbor, MI, USA Chair, HUPO Human Proteome Project

21st C-HPP Workshop: “Dark Proteome” Saint-Malo, France 12 May 2019 Missing Strategy focused on Annotation, following Chr 2/14 Consortium (Duek et al, 2016) The number of PE2,3,4 Chr 17 Missing Proteins was reduced from 148 in neXtProt release 2016-01 to 105 in release 2018-01. We had 43 new PE 1 proteins toward the next-50 MP goal of 50 officially announced by the C-HPP in September 2016 at the Sun Moon Lake HPP Workshop. Progress for Chromosome 17 was as follows: neXtProt version PE2+3+4 PE2 PE3 PE4 [PE5] 2016-01 148 123 19 6 23 2017-01 125 103 17 5 23 2017-08 114 98 12 4 23 2018-01 105 88 13 4 23 How 43 MPs were Upgraded to PE1 between 2016 and 2018 in neXtProt by MS +/- PPI

Omer Siddiqui, Hongjiu Zhang, Yuanfang Guan, Gil Omenn Overall Strategy for Finding the Remaining 105 Chr 17 MPs with MS or PPI For MS, 99/105 have 2 predicted proteotypic peptides; 29 have one annotated in neXtProt Among 29 with a single proteotypic peptide in PA/neXtProt, we found a second non- nested “stranded” peptide for 7 in GPMdb with PXD identifier and data in PRIDE

** Spectral Match of Observed and Synthetic DLLPSQTASSLCISSR Peptide for PIRT Summary of 35 High-Priority Chr 17 MPs The Net of 7 Fewer PE2,3,4 Missing Proteins from Chromosome 17

Table 1 shows the 13 MPs promoted to PE1, plus 3 new neXtProt entries directly assigned to PE1 (CD300H, SPEM3, and SMIM36). Table 2 shows 5 PE1 proteins demoted to PE2 or 3.

neXtProt is certainly a dynamic database! Proteins Promoted or Appointed to PE1 during 2018-2019 neXtProt ID name Update NX_A6NF36 CCDC182 11 promoted from PE2 NX_A8MZ36 EVPLL NX_O15375 SLC16A5 NX_P29275 ADORA2B NX_Q6DHY5 TBC1D3G NX_Q6UXU6 TMEM92 NX_Q86WR6 C17orf64 NX_Q8IVW1 ARL17B NX_Q9BYQ8 KRTAP4-9 NX_Q9ULX5 RNF112 NX_Q9UQ05 KCNH4 NX_A8MTY7 KRTAP9-7 1 promoted from PE3 NX_A6NCQ9 RNF222 1 promoted from PE4 NX_A0A0K2S4Q6 CD300H 3 new entries NX_A0A1B0GUW6 SPEM3 NX_A0A1B0GVT2 SMIM36 Proteins Demoted from PE1 from 2018-01 to 2019-01 neXtProt ID Gene name Update

NX_O14610 GNGT2 4 demoted to PE2

NX_O14894 TM4SF5

NX_Q07627 KRTAP1-1

NX_Q8N9I5 FADS6 NX_Q9BYP8 KRTAP17-1 1 demoted to PE3 Detailed Changes for Chr 17 Among the newly promoted 13 MPs now PE1, RNF222, SLC16A5, and EVPLL represent proteins we highlighted with potential stranded peptide spectra [1]; they are now confirmed with fresh spectral evidence. Among the 11 promoted from PE2 to PE1, ADORA2b, C17orf64, and RNF112 had been identified as priority candidates [1]. In addition to the 5 PE1 proteins demoted from PE1, neXtProt 2019-01 removed a former PE2 entry (NBR2), demoted a former PE2 entry (FAM215A) to PE5, and included 3 new MPs for Chromosome 17: RNF227 (PE2) and ANKRD40Cl (PE4) promoted from PE5 and PVALEF, a new PE3 entry. Goal of Identifying 50 Missing Proteins Has Been Met

Summary: Progress for Chromosome 17 is as follows over 3 years: neXtProt version PE2+3+4 PE2 PE3 PE4 [PE5] 2016-01 148 123 19 6 23 2017-01 125 103 17 5 23 2018-01 105 88 13 4 23 2019-01 98* 82 13 3 24

*11 PE2, 1 PE3, 1 PE4 promoted to PE1; PE 1 had 5 demoted to PE2/3; 1 new PE3 Conclusion

Chromosome 17 is the first of the 24 (excluding mitochondria) to achieve the neXt-MP50 Challenge goal. Similar analyses by or for the other chromosome teams could accelerate progress across the entire proteome.

Function Annotation for uPE1 Proteins, Comparing I-TASSER/COFACTOR Predictions with neXtProt Curation Chengxin Zhang, Bioinformatics Student Gilbert S. Omenn, MD, PhD Harold T. Shapiro Distinguished University Professor Center for Computational Medicine and Bioinformatics University of Michigan, Ann Arbor, MI, USA Chair, HUPO Human Proteome Project

21st C-HPP Workshop: “Dark Proteome” Saint-Malo, France 13 May 2019 Predicting uPE1 Functions as GO Terms with I-TASSER and COFACTOR Algorithms GO Term Prediction Accuracy (Fmax) with several methods on 100 random PE1 Chr 17 Proteins as Benchmark Analysis

Zhang C, Wei X, Omenn GS, Zhang Y. J Proteome Res 2018 Results for GO Terms MF, BP, CC for the 66 uPE1 Chromosome 17 Proteins (13,33, 49 exceed thresholds) Simplified Figure for I-TASSER/COFACTOR Pipeline for Function Annotations

17q21.2 17q25.1 Keratin (28/56) CD300 molecule-like (6/7) 17p13.3 Keratin-associated proteins (25/104) Otopetrin (2/3) Olfactory receptors Aipha-N-acetylgalactosaminide (8/398) alpha-2,6-sialyltransferase (2/5) 17p11.2 Aldehyde 17q24.2 dehydrogenase 3 17q21.32 17p13.2 Voltage-dependent family (2/3) Homeobox Olfactory calcium channels (3/25) receptors (4/398) (9/39)

51 70 104 24 67 2 76 104 10 99 84 47 41 43 9 18 26 6 20 6 91 3 94

17p13.1 Asialoglycoprotein receptor (2/2) Myosin [7(13)/65] 17q21.31 17q23.3 Arachidonate 15-lipoxygenase Hexamethylene bis- Growth hormone (2/4) acetamide inducible (2/2) family (5/5) 17q12 Cytokine (20/28) ERBB2 amplicon 17q25.3 TBC1 domain family Chromobox homologs [9(13)/38] (3/8) C-C chemokine ligand (19/38) Schlafen family member (5/5) C-HPP Chromosome 17 Initiatives Proteogenomics analyses of transcript and co-expression from the ERBB2 amplicon (17q12- 21) and its downstream networks. Identification and network biology of alternative splice isoforms of ERBB2, EGFR (ERBB1), and related proteins Predictions of protein folding and function Isoform-level network maps, built around highest-connected isoforms Finding the 105 Chr 17 missing proteins and annotating the 66 Chr 17 uPE1 proteins

The ERBB2 Amplicon Can range from ERBB2-GRB7 to up to 23 adjacent Many, but not all, observed in our studies to be over-expressed as transcripts and proteins 13 of the top 20 genes associated with ERBB2+ BRCA are on Chr 17. Functional annotations point to actin cytoskeletal reorganization, regulation of apoptosis, stabilization of P-ERBB2, ER- mediated Ca++ signaling, and DNA synthesis. Top 20 Genes from All Chromosomes Ranked According to their mRNA Expression across 10 ErbB2+ Breast Cancer Datasets in Oncomine; 13 of these genes are on Chr 17 (exceptions are KMO, TMEM45B, SLC22A23, TMEM86A, FHOD1, CATSPERB, GPCPD1). The Recognition and Emergence of Splice Variants Multi-cellular organisms have evolved remarkable splicing mechanisms to convert heterogeneous nuclear RNA transcribed from multi-exon genes to mRNAs. There may be several or numerous alternative transcripts per coding gene. Now recognized to be ubiquitous. The implications are huge: When we speak or write about “up- ” or “down” expression of genes, we must recognize we are dealing with a mixture of gene products whose actions cannot reliably be related to the combined protein concentration. The structure/function relationships of each variant must be elucidated.