<<

Human Proteome Project 2019 HUPO Council Report (Adelaide, Australia)

This 2019 HUPO Council Report outlines scientific, management and outreach progress made by the HUPO Proteome Project team in 2018-9 and contains key extracts taken from the following; • 2019 HPP JPR Special Issue Metrics Paper (by Gil Omenn/HPP EC) • C-HPP 2019 Report (by Chris Overall, Lydie Lane and Young-Ki Paik) • B/D-HPP 2018-2019 Questionnaire Report (by Fernando Corrales) • HPP Thursday Future Workshop presentation to HPP SSAB at HUPO2018 Orlando (by Mark Baker) • HPP Achievements/Legacies Doc (by HPP EC) • HPP Governance Committee Doc (by HPP EC)

1

Progress on Identifying and Characterizing the Human Proteome 2018-2019 Metrics from the HUPO (Except from Draft JPR HPP Special Issue submission under consideration) Gilbert S. Omenn* × π, Lydie Lane∞, Christopher M. Overall , Fernando J. CorralesA, Jochen M. SchwenkB, Young-Ki PaikC, Jennifer E. Van EykD, Siqi LiuE, Stephen PenningtonH, Michael P. SnyderF, Mark S. BakerG, Eric W. Deutschπ

ABSTRACT: The Human Proteome Project (HPP) annually reports on progress made throughout the field in credibly identifying and characterizing the complete human parts list and making an integral part of multi- studies in medicine and the life sciences. NeXtProt release 2019-01-11 contains 17 694 with strong protein-level evidence (PE1), which represent 89% of all 19 823 neXtProt predicted coding genes (all PE1, 2, 3, 4 proteins), up from 17 470 one year earlier. Conversely, the number of neXtProt PE2, 3, 4 proteins, termed the “missing proteins” (MPs), has been reduced from 2949 to 2129 since 2016 through efforts throughout the community, including the chromosome-centric HPP. PeptideAtlas is the source of uniformly reanalysed raw mass spectrometry data for neXtProt; PeptideAtlas added 495 canonical proteins between 2018 and 2019, especially from studies designed to detect hard-to identify proteins. Meanwhile, the has released version 18.1 with immunohistochemical evidence of expression of 17 000 proteins and survival plots as part of the Atlas. Many investigators apply multiplexed SRM- targeted proteomics for quantitation of organ-specific popular proteins in studies of various human diseases. The 19 teams of the Biology and Disease-driven B/D-HPP published a total of 382 publications in 2018, bringing proteomics to a broad array of biomedical research.

Table 1. neXtProt Protein Evidence Levels from 2012 to 2019: Progress in Identifying PE1 and PE2,3,4 Missing Proteins

2

Fig. 1 (above). Pie charts showing distribution of human proteins based on the type of protein evidence data. Shown here are the numbers of PE1 proteins (green plus yellow), PE2,3,4 MPs, and PE5 entries as of neXtProt releases 2018−01 and the most recent release 2019-01.

Fig 2 (above). These flowcharts depict the changes in neXtProt PE1−5 categories from release 2017−01 to 2018−01 (left) and from 2018-01 to 2019-01 (right).

HPP Mass Spectrometry Data Interpretation Guidelines 3.0 At a full-day workshop prior to the 21st Saint-Malo C-HPP Workshop in May 2019, the HPP leadership gathered for an extended discussion of 25 open questions from the HPP Knowledgebase Resource Pillar and other HPP teams relating to updates of the HPP Mass Spectrometry Data Interpretation Guidelines 2.111 and implementation of the workflow by which the HPP confirms the translation of potential coding genes. Each of these questions was debated, potential solutions listed, and consensus decisions achieved by the participants. The result will be an update of the Guidelines from version 2.1 to version 3.0, with a manuscript (Deutsch et al, pending for this issue) describing the set of questions, potential solutions, proposed consensus decisions, and the logic behind those proposals. For the checklist required for submitted manuscripts, numbered items will be re-factored into logical 3

subgroups and authors will be required to provide page numbers on the checklist so reviewers can see where specific guidelines are addressed in the manuscripts. The term “extraordinary detection claims,” which appears to have caused some confusion, will be replaced with “new PE1 protein detection claims.” The revised guidelines will refine how peptide nesting is defined and specify how sequence identical protein entries should be handled.

Two new guidelines will be added: (1) the provision of Universal Spectrum Identifiers (USI; http://psidev.info/USI), a feature developed by the HUPO Proteomics Standards Initiative (PSI) that enables the unique identification of a particular spectrum being held up as evidence for a new PE1 protein detection claim across proteomics repositories, suitable for searching; and (2) a guideline for handling HPP datasets that use data independent acquisition (DIA) workflows including SWATH-MS44. These guideline changes are expected to take effect for contributions to the 2020 JPR HPP Special Issue.

At the workshop, several changes to the overall pipeline for tracking detections of MPs were considered. The current pipeline begins with deposition to ProteomeXchange, reprocessing of datasets by PeptideAtlas, and final mapping and incorporation by neXtProt. It was agreed that there would be no substantial change to the basic set of guidelines for calling a protein successfully identified by mass spectrometry. However, the meaning of the terms non-nested and uniquely-mapping has been interpreted and implemented in slightly different ways among the different components of the pipeline. A consensus interpretation was clarified and will be documented.

For proteins that come close to meeting the guidelines, but do not meet them due to their extreme sequence composition (e.g., very short or very hydrophobic), it was decided that complex exception rules are not yet warranted. Rather a panel of researchers including neXtProt curators will be established to review evidence for special cases and classify proteins as PE1 if the available evidence is compelling but falls short of the Guidelines 3.0 due to valid physiochemical reasons. No proteins would be declared too difficult to detect yet, although there was substantial discussion about the extreme difficulties of detecting olfactory receptors and other categories of membrane-bound proteins, let alone those proteins predicted to come from purported genes lacking measurable transcripts. Discussion sections of previous JPR HPP Special Issue Metrics papers have addressed these challenges.

Finally, a plan was initiated for incorporating the dataset reprocessing results of MassIVE-KB ProteinExplorer, including BioPlex data (while excluding bait) through the PeptideAtlas TPP to feed into the 2020 neXtProt HPP release. Refinements to the overall HPP pipeline for tracking

4

high stringency identification of MPs should facilitate confident completion of the attainable protein parts list over the next several years.

Table 2 (below) presents a detailed chromosome-by-chromosome accounting of the status of the MPs search, elaborated from neXtProt 2019-01. Also tabulated are the numbers of functionally unannotated PE1 proteins (uPE1) of the human proteome. Together the PE2,3,4 MPs and the uPE1 proteins constitute a large part of what has been termed the “Dark Proteome” (DP).

See https://www.nextprot.org/about/protein-existence

5

The Chromosome Centric Human Proteome Project (C-HPP) Annual Report 2018-2019 prepared for the 2019 HUPO Council

Submitted July 31, 2019 by:

Christopher M. Overall Chair, Young-Ki Paik Co-Chair, Lydie Lane, Co-Chair On behalf of the C-HPP Executive Committee

1. Name of Initiative: Chromosome-Centric Human Proteome Project (C-HPP)

2. Name of Committee Chair: Chair: Christopher M. Overall, Co-Chairs: Young-Ki Paik, Lydie Lane

3. Names of Committee Members:

C-HPP Executive Committee (EC):

Chair: Christopher M. Overall Canada to Dec 31, 2021 Co-Chair: Lydie Lane Switzerland to Dec 31, 2020 Co-Chair: Young-Ki Paik Korea to Dec 31, 2021 Secretary General: Peter Horatovich The Netherlands to Dec 31, 2019 Member-at-Large: Pengyuan Yang China to Dec 31, 2021 Member-at-Large: Fernando Corrales Spain to Dec 31, 2019 Member-at-Large: Gilberto Domont Brazil to Dec 31, 2021

Principal Investigators Council (PIC):

Chromosome 1: Ping Xu China Chromosome 15: Gilberto Domont Brazil Chromosome 2: Lydie Lane Switzerland Chromosome 16: Fernando Corrales Spain Chromosome 3: Takeshi Kawamura Japan Chromosome 17: Gilbert S. Omenn USA Chromosome 4: Yu Ju Chen Taiwan Chromosome 18: Alexander Archakov Russia Chromosome 5: Peter Horvatovich Chromosome 19: Sergio Encarnacion Mexico The Netherlands Chromosome 20: Siqi Liu China Chromosome 6: Rob Moritz USA/Canada Chromosome 21: Albert Sickmann Germany Chromosome 7: Edouard Nice Australia Chromosome 22: Akhilesh Pandey USA Chromosome 8: Pengyuan Yang China Chromosome X: Yasushi Ishihama Japan Chromosome 9: Je-Yoel Cho Korea Chromosome Y: Ghasem Hoeissini Chromosome 10: Josh Labaer USA Salekedeh Iran Chromosome 11: Jong Shin Yoo Korea Mitochondrial: Andrea Urbani Italy Chromosome 12: Ravi Siredeshmukh India Chromosome 13: Young-Ki Paik Korea Chromosome 14: Charles Pineau France

4. C-HPP Mission and Objectives

The mission of the C-HPP is to map and annotate the entire human proteome comprising the individual proteins encoded by each chromosome, their major splice forms, mature N- and C-termini, and their major protein post-translational modifications (PTMs) (see HUPO.org). In the C-HPP this is accomplished by directed studies initiated by the 25-international chromosome + mitochondrial DNA teams. Effective collaborations exist between the chromosome teams and other members of HUPO within the 19 B/D-HPP initiatives and the 4 HPP Pillars.

C-HPP 1

Phase 1 of the HPP project is focused on identifying by mass spectrometry all human proteins, presently estimated in the human to be 20,399 (neXtProt 2019-01-11). Those proteins identified by protein existence (PE) information number some 17,694 (PE1), with PE2 – 4 proteins remaining to be detected at the protein level—the so called “missing proteins” (MPs). At present, there remain 2,129 MPs (PE 2 – 4) yet to be identified. In Santiago C-HPP-2018, the neXt-CP50 Challenge was launched to functionalize proteins in the “Dark Proteome” with no known function, whether predicted or described. In 2018 PE1 – 4 proteins these numbered 1,937.

Phase 2 will focus on the neXt-CP2000 (to functionalize 2,000 uPE1s), ~5 PTMs / PE1 protein, and their splice forms. Also targeted are nonconventional-encoded small open reading frame translation products (smORFs), fusion proteoforms, and translatable products of long non-coding (lnc) RNAs.

5. Summary of Recent Accomplishments, Current Activities, and Tasks

A. neXt-MP50: The neXt-MP50 Challenge was launched at Sun Moon Lake C-HPP-2015 to encourage the Chr teams to identify 50 new MPs each from 2,949 MPs (2016) and to devise and employ innovative approaches to uncover MPs. This challenge has been extended past the original two-year window. Semi-annual reports from each chromosome team are posted on the C-HPP Wiki. The number of MPs declined from 2,168 (neXtProt 2018-01-17) to 2,129 now (neXtProt 2019- 01-11) with a number of Chromosome teams having completed the neXt-MP50: Chromosome 1, 2, 5, 17, 19, and X. However, the number of protein entries also increased from 20,230 (neXtProt 2018- 01-17) to 20,399 (neXtProt 2019-01-11). Thus, the decreasing numbers of MPs found each year reflects both the increasing difficulty in devising and executing deep discovery of MPs in the human proteome as well as some realignment of protein encoding gene numbers and PE identifications occurring from time to time by database curators. At Orlando HUPO-2018 the chromosome teams reported around 401 MPs that were identified in 2018 according to the C-HPP guidelines v2.1.

B. neXt-CP50: With the official launching of the neXt-CP50 challenge, the goal is to characterize 50 uPE1 proteins within 3 years by 15 Chromosome teams, to date. To start this challenge with realistically attainable goals, only those uncharacterized (u) proteins that have already been positively identified at the protein level (PE1) are being analyzed. In March 2018 there were 1,937 proteins with no known function, of which 1,260 were PE1 proteins, now termed uPE1 proteins (10.1021/acs.jproteome.8b00383). In 2019 there are 1,254 uPE1, out of 2,222 uPE1 – 4 entries in neXtProt according to sparkle query NXQ_00022. These proteins have mass spectrometry identified and archetypic peptides reported from some tissues and cells, providing uPE1 project start points.

C. 21st C-HPP Symposium, St Malo, France: Between May 12 – 14, 2019 the C-HPP held its Semi- Annual Workshop, the 21st in a series of highly successful updates on the chromosome teams progress for completion of the HPP, with a focus on the newly initiated ‘Dark Proteins’ neXt-CP50 challenge. Charles Pineau was the local organizer, par excellence—French cuisine, wine and la belle vie flowed through the meeting attended by 43 registrants. The Journal of Proteome Research (JPR) partnered with the C-HPP and heavily publicized the workshop by JPR splash page “sliders” with updates on the C-HPP Wiki and the HUPO web sites. On May 11, a one-day session on Mass Spectrometry Data Interpretation Guidelines 3.0, was led by Eric Deutsch (ISB, Seattle, HPP Working Group) to update the HPP guidelines v2.1 for identification of MPs. Vigorous debate and discussion successfully resolved 25 open questions (see HPP Chair’s report). The 2019 Special Issue (SI) of JPR will present a paper on the new v3.0 Guidelines. The scientific program included 5 invited and 23 talks selected from the abstracts on various aspects of the human proteome and the dark proteome, including ,

C-HPP 2 a summary of the changes to the HPP Data Interpretation Guidelines by Eric Deutsch, updates on neXtProt by Lydie Lane, neXt-MP50 by Chris Overall, and the neXt-CP50 by Young Ki Paik. Anne- Claude Gingras (Toronto, Canada) presented proximity-dependent biotinylation (BioID) for revealing the localization of human cellular proteins that could be forward/reverse exploited for MP identification. Nuno Bandeira presented ProteinExplorer for integrating community-scale big data for assessing protein existence. Yves Vandenbrouck (Chr 14) explored the dark side of the human proteome using ProteoRE, and Fernando Corrales (Chair B/D-HPP, C-HPP EC, Head Chr 16) discussed activities and ideas for collaboration between the B/D-HPP and chromosome teams.

D. Publication of the Special Issue of the HPP in the Journal of Proteome Research: On December 7, 2018, the sixth annual special issue (SI) of the HPP was published in the Journal of Proteome Research, Volume 17, Issue 12, Pages 4,023 – 4,358: Associate Editor: Christopher M. Overall, Guest Editors: Young-Ki Paik, Eric Deutsch, Fernando Corrales, Lydie Lane, and Gil Omenn. Formerly this SI was dedicated to the C-HPP, but in 2017 we expanded its scope to all the HPP. In this issue, 32 papers covered 4 major research topics: (i) missing proteins (MPs), (ii) uPE1 proteins, (iii) bioinformatics tool development and (iv) biology/disease proteomes. The launch of the neXt-CP50 was discussed in Young-Ki Paik et al (Chr 13). Deutsch et al presented the use of spectral library searches in proteomics workflows. The work of Macron et al (Chr 2) described identification of 12 MP candidates in human cerebrospinal fluid following immunodepletion and TMT labeling, from which 8 MP were found. Sun et al (Chr 1) presented a study on human testis, using multiple proteases and high and low pH deep proteomics analysis and identified 14 MPs. The study by He et al (Chr 1) used LysargiNase to identify and validate 2 MPs from 7 MP candidates. Pullman et al presented ProteinExplorer, to explore the large amount of reanalyzed public proteomics data available in MASSIVE, which allowed validation of HPP-compliant evidence for 107 MPs (PE2, PE3, and PE4) and 23 dubious (PE5) proteins. Fernando Corrales (Chr 16) used the new letter format for the identification of the long-sought hyaluronan synthetase 1, surprisingly a MP till then. The C-HPP HQ office in Korea has freely distributed one copy of the printed version to all C-HPP PIs and HPP leaders. The 2019 HPP Special Issue to date has 23 submissions in progress or accepted, with a number of papers rejected. Over the past 3 years, SI submissions have been steadily increasing and track consistently well 2 years after publication where they maintain higher, or not significantly different, citation rates versus to the standard JPR Issues and maintain a consistent average download rate.

E. C-HPP Newsletter No. 8 (August 1, 2019) is posted of the C-HPP wiki https://c-hpp.web.rug.nl/.

F. C-HPP 2.0 Organization. At the 19th C-HPP Workshop, Santiago de Compostela, Spain (June 16 – 17, 2018) and the HPP workshop @ HUPO-2018 Orlando a new organizational plan, C-HPP 2.0, was presented to the PIC and C-HPP membership designed to streamline the Chromosome

C-HPP 3 teams and to mold the C-HPP according to interest in: Protein families; Rare Tissues and Cells; Chromosome Biology; Geography/National Groups; Proteoforms; uPE1 Functionalization ; Technology and New Strategies; Interactomics. However, this was not at all supported by the PIC and members who voted unanimously to maintain the current chromosome- based structure. The National Chromosome team structure was viewed as a strong positive yet still provides opportunities for interdisciplinary projects. It was recognized that a core strength and success of the C-HPP has always been, and should continue to be, the annotation of the human proteome led by Lydie Lane (Head, neXtProt, Chr 2) and Eric Deutsch (Head, Peptide Atlas, Chr 6). The HPP cannot rely upon ad hoc community data uploads to complete the HPP by most ‘outside’ groups that do not recognize the stringency required for MP identification. Nonetheless, several elements of the 2.0 plan have been globally implemented over the C-HPP and in the coming years. Moreover, the C-HPP aims to collaborate with B/D-HPP teams to utilise diseased tissue samples to seek MP “responder proteins” at selected stages of injury/disease/infection/stress and resolution in all the different human tissues likely to harbor MPs that may be key for regeneration or repair of these specific tissues. Similarly, the Pathology Pillar has great potential to confirm distribution of MPs and uPE1 proteins in health and disease at the cell and tissue level, particularly by tissue microarrays (TMAs) for targeted identification searches and clues for functionalization.

6. Future Activities

A. HUPO-2019 Adelaide: C-HPP Poster Session will be held on Monday, September 16, 2019 during the HUPO Congress. The discussion will be led by Gilbert Omenn at each poster where authors will start with a lightning presentation. We thank ProtiFi, LLC (Dr. John P Wilson) for their generous support of USD600 for the Annual C-HPP Poster Awards. Dr. Sean O’Donoghue was invited by Chris Overall to present at the Post Congress HPP Day on the Structural Dark Proteome. Updates on the C-HPP activities will be presented on both of days of the HPP Workshop in Adelaide.

B. neXt-MP50 2019-2020: With the much of the “low hanging” proteome fruits having now been harvested, targeted proteome analyses of specific tissues and cells are now desperately needed, particularly of rare or rarely analyzed human cells and tissues including developing tissues in the human embryo. This is the focus of the “Rare Cells and Tissues Proteomes” concept, now a recognized strategy in collaboration with Neil Keller by Top Down analyses. To identify these temporally or spatially rare proteins will take dedicated searches of specific cells and tissues in adult tissues eg. olfactory epithelium below the cribriform plate, hard connective tissues eg. membranous and cartilaginous bone, dental cementum, dentine and enamel, or during embryo and fetal growth at precise developmental windows—an ethical and practical challenge for many countries. Cerebrospinal fluid, and both male and female reproductive tissues have proven to be rich sources of MPs that require extensive reanalysis using a variety of new tissue sample and proteomic techniques. Specific strategies to identify MPs can be beta-tested using, for example, recombinant MPs from full-length plasmids for ~62% of the missing proteins available from the Chr 10 team, led by Josh Labaer at Arizona State University.

C. neXt-CP50 2019-2020: Progress will be accelerated based on availability of resources useful for their characterization including antibodies through the Human Protein Atlas and expression clones for ~70% uPE1s available from the Chr 10 team led by Josh Labaer at Arizona State University.

D. 23rd C-HPP Symposium St. Petersburg to Valaam Island, Russia: In 2020 we look forward to Russia hosting the 23rd C-HPP Annual Workshop “From Chromosome-Centric Project to the Human Proteome", on a river-class cruise ship, traveling along the rivers and lakes between St Petersburg and Valaam Island. This Symposium format provides an opportunity for many informal and fruitful discussions between participants, combining the high level of the scientific program at the same time visiting Russian cultural and historic places between sessions.

C-HPP 4

Biology/Disease Human Proteome Project (B/D-HPP) Annual Report 2017-2018 Submitted by Fernando Corrales and Ileana Cristea on behalf of all members and liaisons of the B/D-HPP Executive Committee

1. Overview and expansion The B/D-HPP is focused on supporting the use of state-of-the-art proteomic methods to characterize and quantify proteins for in-depth understanding of the molecular mechanisms of biological processes and human disease. The B/D-HPP is truly a grass root initiative where groups of individuals come together globally to address key issues. One goal of the B/D-HPP is to broaden the impact of proteomics to the broader community based on organ and disease areas. A second goal is to develop popular proteins within the B/D-HPP in order to prioritize protein targets that are highly relevant to each field to deliver relevant assays for the measurement of these selected targets and to disseminate and make publicly accessible the information and tools generated. In HUPO Dublin, there were 19 B/D- HPP initiatives with 3 closely related HPP resource pillars. It is from these initiatives that the chairs and the B/D-HPP create 6 main sessions for the international HUPO meeting 2017 and 2018. These and other groups also have a chance to present and catch up with each other on the reporting-oriented Sunday and future-oriented Thursday workshops hosted by the whole HPP. An additional effort of the B/D-HPP is to encourage and support the scientific career development of HUPO ECRs. In this sense the initiative is strongly contributing to the activities of the early career researchers helping them to create opportunities to present their work at international HUPO congresses (e.g. the ECR manuscript competition) or to interact with more senior HUPO scientists (e.g. ECR mentoring day). The organization is Fernando Corrales, chair and Ileana Cristea (co-chair) with the most amazing executive committee consisting of Jennifer Van Eyk (past-chair), Gil Omenn (Ex officio), Hui Zhang, Eric Deutsch, Pengyuan Yang, Tadashi Yamamoto, Sanjeeva Srivastava, Paola Roncada, Michelle Hill, Ferdinando Cerciello (ECR representative) and Mark Baker (HPP Chair). 1.a. B/D HPP initiatives activity this year. According to the procedure established last year, we sent out a questionnaire to determine the viability and work being carried out by the various initiatives. The aims were: 1. To update the information about the initiatives on the B/D-HPP initiative page of the HUPO website. We would like to have an updated and attractive picture of the B/D group aims, activity and achievements to facilitate cross interactions with other HPP groups and to attract additional partners from the scientific community. 2. To compile the collected information in a B/D-HPP annual report. That will provide a global view of the B/D-HPP as a whole and will highlight our strengths. 3. To explore the willingness to participate in the HUPO Congress. This is the main HUPO activity and the participation of the B/D groups is highly encouraged by submitting abstracts to the HPP session topics, active inputs in the Sunday and Thursday programs on popular/priority proteins, PTMs, etc. 4. To explore the willingness to prepare manuscripts for the JPR Special Issue. To find new ways of interaction across B/D and C-HPP groups.

6

Results from the questionnaire

2018 Questionnaire B/D HPP

An international collaborative project that deals with mapping, annotating and characterizing the proteome using proteomics technologies in its relation to and/or diseases. B/D-HPP provides a framework for the coordination of 19 initiatives that integrate about 50 multi-national research groups.

B/D HPP Questionnaire items 1-4

1. Please state the name of your initiative, name and email of chair and co-chairs, starting date of your B/D initiative and any additional information: 2. What are your main aims? 2.a. Brief statement: 2.b. Some of the current lines of work include (fill in for your initiative): 80 2.c. Main achievements in 2017: 70 3. Any related documents you’d like to link? 3.a Websites and links (fill in for your initiative): 60 3.b Papers (fill in for your initiative)

3.b.1 Papers published in collaboration within the initiative 50

* * 3.b.2 List Top 5 papers for 2017

40 * *

3.c Congresses in 2017? (Committees, lectures… )(fill in for your initiative) * 3.c.1 Participation in proteomics meetings in 2017? 30 * 3.c.2 Participation in congresses organized by other biomedical or clinical associations 20 3.d Other documents

4. Educational and dissemination activity (Courses, workshops, summer schools, etc) in 2017 10

0

s s s

r

e e

i

e

s

t

s

i

p

r

v

a

r i

t

p

g

c

n

d

a

o • e

Updated information for each initiative: leadership, participatin c

n

h

l

o

s

i

i

a

l

t

n a

laboratories, main aims, plans and activity. Valuable information to b

o

u n

i

i

t

P

a

m n

keep updated the B/D HPP web page. e

r

s

e

s

i

t

d

n I

• In addition to the 72 papers (top 5 published by initiatives’ labs) /

l

a

n o

reported by the 13 initiatives answering the questionnaire, 17 were i

t

a c

collaborative efforts of the participating laboratories. u d • Participation in 44 international congressess, 20 on proteomics and E 24on other biomedical, clinical disciplines. *Collaborative publications • Organisation of 30 educational and dissemination related activities. **Proteomics ***Biomediacal/clinical B/D HPP Questionnaire item 7 B/D HPP Questionnaire item 6

6. Are you currently doing research on popular/priority proteins? 7. Activities and meetings planned for 2018 Yes:

If not, would you be willing to? 10 Meeting participation/organization 9 Initiative HUPO/HPP Others 11 Yes No In the future 8 Cancer 1 10 HBPP Cancer FAN 7 CVD 1 9 CVD HIPP 6 EyeOme 8 HKUPP FAN 5 EyeOme 5 7 HBPP 1 3 4 6 HIPP 1 Mitochondria 3 5 PediOme 2 Liver 2 4 Plasma 1 Mitochondria 1 1 RAD 3 0 IMOP Yes No In the future PediOme 1 2 • Protein lists available from peptide atlas (https://db.systemsbiology.net/sbeams/cgi/ Plasma 1 1 1 RAD 1 1 0 PeptideAtlas/proteinList?protein_list_id=45 ). HUPO/HPP Others • Five papers already published (EyeOme, Liver and RAD). • Two published tools for selecting popular proteins from the literature • Data-Driven Approach To Determine Popular Proteins for Targeted Proteomics Translation of Six Organ Systems. Some initiatives (HBPP, CVD, Liver, RAD) report activities including setting up Lam MP, Venkatraman V1, Xing Y, Lau E, Cao Q2, Ng DC, Su AI3, Ge J2, Van Eyk JE1, Ping P. analytical methods (for popular proteins in some cases) and preparation of J Proteome Res. 2016 Nov 4;15(11):4126-4134. Epub 2016 Jul 19. Systematic Protein Prioritization for Targeted Proteomics Studies through Literature Mining. publications (new data and reviews). Yu KH, Lee TM, Wang CS, Chen YJ, Ré C, Kou SC, Chiang JH, Kohane IS, Snyder M. • All initiatives attending the HUPO meeting in Orlando were planning to present their J Proteome Res. 2018 Apr 6;17(4):1383-1396. doi: 10.1021/acs.jproteome.7b00772. Epub 2018 Mar 15. work but this has not being included in the agenda.

B/D HPP Questionnaire item 8

8. During the analysis of your specific and perhaps unique samples that might be of great value to identify missing proteins. • Would you be willing to share raw MS data sets for reanalysis in collaboration with other HPP initiatives? • Would you be willing to share samples? Like what? • Would you be willing to run samples from other teams?

Initiative Data Samples Run samples 14 Cancer 1 1 1 CVD 1 1 1 12 EyeOme 1 0 0 FAN 1 1 1 10 HBPP 1 0 0 HIPP 8 Liver 1 0 1 Mitochondria 1 0 1 6 IMOP 1 0 1 PediOme 1 1 1 4 Plasma 1 0 1 2 RAD 1 1 1 KUHPP 1 0 1 0 12 5 10 Data Samples Run samples

• Data sharing if published or under collaboration • Type of sample not indicated but in the case of RAD (serum). Difficulties due to regulatory issues • Availability to run limited number of samples according to capacity.

7

B/D HPP Questionnaire item 10 2.

10. Would your initiative be willing to prepare a data-driven manuscript based around your initiative for the HPP special issue in Journal of Proteome Research? Manuscripts are due by 31 May 2018

Initiative HPP 6 session track Sunday HPP Pis Thursday HPP Early morning Cancer 1 1* 1 1* CVD 1 1 1 1 EyeOme 1 1 1 1 FAN 1 1 1 1 HBPP 1 1 1 1 HIPP 1 0 1 0 Liver 1 1 1 1 Mitochondria 0 0 0 0 IMOP 0 0 0 0 PediOme 1 1 1 1 Plasma 1 1 1 1 RAD 1 1 0 1 KUHPP 1 1 1 1 11 9 10 9

Contributions to HUPO Orlando 2018. B/D-HPP continues to be involved at International HUPO Sunday: HPP Investigators Reporting Meeting Sunday-Wednesday Bioinformatics Hub Monday – Wednesday: HPP Scientific Track comprising 6 sessions, ECR manuscript competition, Clinical Scientists Travel Awards, PhD Student Poster Awards Thursday: HPP Future/Strategic Workshop

3. Outreach by B/D-HPP to the proteomics and broader scientific community

3a. B/D-HPP played a role in other proteomics and educational meetings or manuscripts this year: Some examples are listed below. 1. C-HPP meeting Santiago de Compostela (2018). Links between C-HPP and B/D-HP. 2. HPP session in the “Technological Platforms and Precision Medicine” Summer School. University Complutense of Madrid, El Escorial, July 23-27, 2018. 3. HIPP Summer School, Madrid, September 10-13, 2018 4. 28th HBPP Workshop, May 8-9 2018, Adelaide, Australia

3b. B/D-HPP newsletter and contribution of HUPOST In 2017, it was decided B/D-HPP newsletter was incorporated into HUPOST. The modern look and feel of HUPOST is expected to be more user-friendly and engaging. Since Oct 2017, 10 HUPOST articles have been contributed from B/D-HPP (most written by Michelle Hill). A few initiatives were specifically invited to contribute and recently a general call was made to all B/D initiatives. June - 1st HUPO Glycoproteomics Initiative Study (Nicki Packer) May - Getting down to details: disease-associated PTMs and proteoforms (Michelle Hill) April - HPP initiative on Rheumatic and autoimmune diseases (RAD) (Michelle Hill) March - EyeOme (Michelle Hill) Jan - Proteomics goes clinical (Michelle Hill) and Clinical Proteomics Course (Fernando Corrales)

8

For example, the Human Glycoproteomics Initiative (HGI) is running a community glycoproteomics software study, under leadership of Nicki Packer.

3c. Papers One of the main goals of the B/D-HPP is to unveil the molecular basis of physiological/pathological processes by the identification of the driver proteins involved. To guide studies in this direction, B/D HPP initiatives have been encouraged to configure lists of popular proteins in their specific areas (highly cited proteins in association with the topic of interest) to generate functional hypothesis and to pave the way for new clinical developments. Two web tools have been recently developed to perform systematic bibliographic searches to rank the most cited proteins under the selected specific topic (Lam et al JPR 2017; Yu KS et al 2018). The usefulness of the popular protein approach has been proved in two studies demonstrating the principal role of the reconfiguration of one carbon metabolism in the liver during hepatocarcinogenesis (Mora MI JPR 2017) and the development of a targeted method to monitor B-type natriuretic peptidoforms that might prove useful for the diagnosis and monitorization of heart failure (Shenyan Zhang et al JPR 2017). Characterization of proteoforms and PTMs is yet an unmet need to understand the dynamics of pathogenic processes. A novel mass spectrometry-based whole protein assay enabled to quantitate the percentage of mutant KRAS4b present in colorectal cancer tissue, and the differences on C-terminal carboxymethylation, which is critical for KRAS function (Ioanna Ntai et al PNAS 2108). Understanding PTM status of drug targets and the functional effects is key to next generation therapies. Jenny Van Eyk and co-workers have beautifully shown that S- nitrosylation of GSK3B at specific residues send the protein to the nucleus, away from its cytoplasmic location resulting in a different repertoire of phosphorylated substrates, suggesting different drug responses (Shengbing Wang et al Circulation Research 2018) MS is the only available technology to interrogate the immunopeptidome in an accurate, systematic and unbiased manner. The Human Immunopeptidome Proteome Project (HIPP) (Caron E et al Immunity 2017) has developed the first public database of quality-controlled immunipeptidomic data generated by mass spectrometry (Shao W et al Nucleic Ac Res 2018).

One of the principal aims of the B/D-HPP is to contribute to the better understanding of human organ and pathology by providing comprehensive proteome insights. To this end, eye and plasma proteomes have been recently updated. A total of 9,782 non- redundant proteins are now in the human eye proteome database. Proteomes of 11 tissues and biofluids are included, with the highest number (6538 proteins) from vitreous humor and the lowest number (827) from aqueous humor (Ahmad MT et al Proteomics, 2018).

More than 122,000 peptide sequences belonging to 3509 protein identifications compliant with the HPP guidelines are described in the last Plasma Peptide Atlas (Schwenk JM et al JPR 2017). The in-depth analysis of the synaptosomal proteome allowed the association of specific protein expression patterns with social behaviour in patients with schizophrenia. This is an excellent example illustrating the advantages of establishing joint C- (chromosome 15)

9

and B/D HPP (Brazilian Brain initiative) ventures. A similar cooperation lead to a comprehensive description of the human mitochondrial proteome under standardized protocols (Alberio T et al JPR 2017), which are currently being used to assess the pharmacological interest of targeting specific mitochondrial proteins to kill selectively cancer cells (Leanza L et al Cancer Cell 2017). Cancer biomarker discovery has experienced significant progress, as shown by the recent studies published by members of the Cancer B/D-HPP initiative.

As illustrated by the above-mentioned achievements, the productive interaction between HPP groups is shedding light to many relevant aspects of human biology. The cooperative efforts should then guide our next steps in the endeavour of generating a comprehensive human proteome map with all functional annotations needed to decipher the code of life and set the bases of the future molecular precision medicine.

In addition, all B/D-HP initiatives have drafted PubMed searches to capture all relevant human proteome references for incorporation into the new Human Proteome Reference Library (see following).

10

HPP Reference Library Construction Updat ed PubMed IDs PubMed IDs June Name Component DRAFT SEARCH PROVIDED (guessed by HPP Chair if not provided) 20, (Title) (Title/Abstract) 2019 1 Pan-HPP Pan-HPP (human proteome project) 109 448 2 Pan-HPP Pan-HPP "human proteome project" 76 222 ((chromosome or chromosomal or Chr or C-HPP or mitochondria or PE1 or PE2 or PE3 or PE4 or PE5 or missing protein or dark proteome or gene-centric or CP50 or 3 C-HPP C-HPP next-MP50) and (“dark proteome” or “missing protein” or proteomic or 478 5,347 proteogenomic or proteome or “mass spectrometry” or “mass spectrometric”)) or (“mitochondrial human proteome” or “mitochondrial proteome” ) (biology/disease or B/D or B-D or iMOP or “biology- and disease-“) and (“missing 4 B/D-HPP B/D-HPP protein” or proteomic or proteogenomic or proteome or “mass spectrometry” or 15 1,531 “mass spectrometric”) Cancer-HPP #1 (least (cancer or oncology) and (missing protein or proteomics or proteome or 5 B/D-HPP 2,006 18,938 narrow) proteogenomic or mass spectrometry) Cancer-HPP #2 (more 6 B/D-HPP (cancer or oncology) and (proteomics or proteome) 1,285 7,984 narrow) Cancer-HPP #3 (most (cancer or oncology) and (HUPO or HPP or "human proteome project" or C-HPP or 7 B/D-HPP 367 3,177 narrow) B/D-HPP or "missing protein" or proteome) (plasma OR serum AND (human NOT animal NOT seminal NOT plasma membrane 8 Plasma-HPP B/D-HPP NOT plasma cells[Title/Abstract]) and (HUPO or HPP or "human proteome project" 131 1,114 OR C-HPP or B/D-HPP or "missing protein" or proteome) (cardiovascular or cardiac or stroke or heart or circulation) and (“human proteome project” or neXtProt or PeptideAtlas or “missing protein” or “dark proteome” or 9 CVD-HPP B/D-HPP 565 8,973 proteomic or proteogenomic or proteome or “mass spectrometry” or “mass spectrometric”) (neonate or child or children or preterm) and (“missing protein” or proteomic or 10 PediOme-HPP B/D-HPP 214 4,010 proteogenomic or proteome or “mass spectrometry” or “mass spectrometric”) (glyco* not glycolysis) and (“missing protein” or biology/disease or B/D or B-D or 11 Glycoproteomics-HPP B/D-HPP proteomic or proteogenomic or proteome or “mass spectrometry” or “mass 450 2,327 spectrometric”) (brain or neurone or neurological or cognitive or Alzheimer’s or MND or AVM or motor neurone or schizophrenia or Parkinson or neurodegeneration or ALS or 12 Brain-HPP B/D-HPP 1,420 13,012 “mental illness”) and (“missing protein” or proteomic or proteogenomic or proteome or “mass spectrometry” or “mass spectrometric”) (kidney or nephropathy or urine) and (HUPO OR HPP OR human proteome OR C- 13 Kidney/Urine-HPP B/D-HPP 37 427 HPP or B/D-HPP OR "missing protein") (liver or hepatoma or hepatocellular or hepatic) and (HUPO OR HPP OR "human 14 Liver-HPP B/D-HPP 295 1,494 proteome project" OR C-HPP or B/D-HPP OR "missing protein" or proteome) (infectious or infection) and (HUPO or HPP or "human proteome" or “missing 15 Infectious Disease-HPP B/D-HPP protein” or proteomic or proteogenomic or proteome or “mass spectrometry” or - 7 “mass spectrometric”) and human proteome project (food or or eat or taste or smell) and (HUPO or HPP or "human 16 Food & Nutrition-HPP B/D-HPP proteome" or “missing protein” or proteomic or proteogenomic or proteome or - 6 “mass spectrometry” or “mass spectrometric”) and human proteome project (eye OR ocular OR vision OR tears or ophalmol*) and (HUPO OR HPP OR "human 17 EyeOme-HPP B/D-HPP 21 293 proteome project" OR "missing protein" OR proteome) ((extreme condition) or space) and (HUPO OR HPP OR "human proteome project" 18 Extreme Conditions-HPP B/D-HPP 18 413 OR C-HPP or B/D-HPP OR "missing protein" OR proteome) (diabetes) and (HUPO OR HPP OR "human proteome project" OR C-HPP or B/D-HPP 19 Diabetes-HPP B/D-HPP 55 557 OR "missing protein" OR proteome) (rheumatic disorder or rheumatic or rheumatoid or inflammation) and (HUPO OR 20 Rheumatic Disorders-HPP B/D-HPP HPP OR "human proteome project" OR C-HPP or B/D-HPP OR "missing protein" OR 34 998 proteome) (protein aggregation) and (HUPO OR HPP OR "human proteome project" OR C-HPP 21 Protein Aggregation-HPP B/D-HPP 4 384 or B/D-HPP OR "missing protein" OR proteome) Pathology and (HUPO or HPP OR "human proteome project" OR C-HPP or B/D-HPP 22 Pathology Pillar Pathology Pillar 13 506 OR "missing protein" OR proteome) ((mass spectrometry) or (mass spectrum)) and (HUPO OR HPP OR "human 23 MS Pillar MS Pillar 433 9,645 proteome project" OR C-HPP or B/D-HPP OR "missing protein" OR proteome) (knowledgebase or database or dataset or metrics or guidelines or portal) and 24 KB Pillar KB Pillar (HUPO OR HPP OR "human proteome project" OR C-HPP or B/D-HPP OR "missing 167 3,272 protein" OR proteome) (antibody or (affinity reagent)or monoclonal or polyclonal) and (HUPO or HPP OR 25 Ab/Affinity Reagents Pillar Ab/Affinity Reagents Pillar "human proteome project" OR C-HPP or B/D-HPP OR "missing protein" OR 70 1,162 proteome) (chromosome or chromosomal or C-HPP or mitochondria or PE1 or missing protein or dark proteome or gene-centric or mitochondrial or neXt-CP50 or next-MP50 or biology/disease or B/D or B-D or iMOP or cancer or oncology or (plasma or serum and (human NOT animal NOT seminal NOT plasma membrane NOT plasma cells)) or cardiovascular or cardiac or stroke or heart or circulation or (glyco* not glycolysis) or brain or neurone or neurological or cognitive or Alzheimer’s or MND or AVM or motor neurone or schizophrenia or Parkinson or neurodegeneration or ALS or “mental illness” or kidney or nephropathy or urine or liver or hepatoma or 26 All search terms All search terms hepatocellular or hepatic or infectious or infection or food or nutrition or eat or 122,814 15,562 taste or smell or eye or ocular or vision or tears or ophalmol* or extreme condition or diabetes or rheumatic disorder or rheumatic or rheumatoid or inflammation or protein aggregation or Pathology or knowledgebase or database or dataset or metrics or guidelines or portal or antibody or affinity reagent or monoclonal or polyclonal or neonate or child or children or preterm) AND (HUPO or HPP or "human proteome project" or C-HPP or B/D-HPP or missing protein or proteome or neXtProt or PeptideAtlas or “dark proteome” or proteomic or proteogenomic or proteome or mass spectrometry or mass spectrometric) 11

Extracts from HUPO2018 Orlando HPP SAB Presentation & Gil’s Thank You

12

13

14

HUPO’s HPP Governance Working Party (commences after Adelaide HUPO Congress)

With (i) HPP composition, terms and responsibilities document finalised (ii) HPP co-chair advertisement, (iii) new SAB Chair (Ruedi Aebersold) and (iv) planning to revitalise SAB underway, it was agreed to establish the HPP Governance Working Party. The HUPO HPP Governance Working Party’s terms of reference approved by the HUPO EC are below.

Terms of Reference 1. HPP Governance Working Party will be formed by the HUPO EC from 5-7 HUPO and HPP EC members 2. The aim of the HPP Governance Working Party is to prepare draft recommendations to be forwarded to the HUPO and HPP EC for approval. 3. The draft recommendations of the HPP Governance Working Party should concern processes relating to efficient, collegial and representative matters impacting HUPO’s flagship global scientific project called the Human Proteome Project (HPP). 4. The HPP Governance Working Party should generally represent the HUPO membership, including but not exclusively current elements of the HPP (i.e., C-HPP, B/D-HPP and Resource Pillars). 5. The HPP Governance Working Party will be supported by the HUPO Secretariat. 6. The HPP Governance Working Party should produce a document called the HPP Governance Working Party Draft Report for consideration/revision by the HUPO EC and HUPO Council. 7. The HPP Governance Working Party Draft Report should as comprehensively as possible address issues relating to; (i) HPP Governance, (ii) HPP Operations, and (iii) HPP Standard Operating Procedures

5 Elements of Good Governance 1. follows rule-of -law 2. is transparent, effective, efficient, accountable, participatory and responsive to opportunities and challenges 3. seeks consensus 4. future-proofs the organisation 5. ensures equity/inclusiveness

Draft Working party composition. • Mark Baker (Chair) • Rob Moritz (HUPO EC) • Gil Omenn (oversight, input, draft report review) • Fernando Corrales (B/D-HPP) • Chris Overall (C-HPP) • Lydie Lane (C-HPP/KB pillar) • Ferdinando Cerciello (ECRs; the future) • Anne-Claude Gingras (member-at-large)

15

Our objective is to get what should be a simple task well-documented and agreed by the HUPO community and then move on to matters of HPP scientific impact, HPP public outreach and HPP funding. To be brief, a common view is that good governance requires a fair, legal, representative framework that is “required” by an impartial regulatory body for the protection of stakeholders (here - HUPO members, HPP stream, pillar, Chr and B/D initiative scientists, funding agencies, general research community, public, etc). We already have in place; (i) a HPP structure, (ii) HPP EC composition, (iii) HPP EC terms, (iv) HPP EC roles, (v) HUPO by-laws, (vi) HUPO EC, and (vii) HUPO Council in place - there are also a number of publications (including the JPR SI) that specifically address where the HPP is headed.

HPP governance working party might consider developing responses to each? 1. develop a revised HPP structure that incorporates pathology pillar, cell/tissue stream with new timelines and phases 2. write revised aims, objectives, anticipated phases, milestones, new timeline (effectively a new HPP Framework) 3. establish representative 1 page “pitch” for all HPP EC nominations from HUPO membership (as we do for Council) 4. start representative online elections for HPP EC from all HUPO membership 5. preserve self-governance of the 2 HPP streams and 4 HPP pillars

16

Legacies of HUPO’s Human Proteome Project

1. Annotating the human proteome 2. Building a curated version of the human proteome that will be considered as the reference Human Proteome 3. Creating a comprehensive, accurate, accessible and leverageable Human Proteome Knowledgebase 4. Establishment of a community-wide mission aimed at proteomic mapping of all human proteins systematically using current/emerging techniques 5. Be the resource that brings together proteomics research communities and serves as an entry/contact/connecting point for the life scientific community and industry alike 6. Establish a toolbox and workflows for enrichment, detection, quantification and functional characterization of missing proteins 7. Improve detection of low abundance and/or restricted temporo-spatial expression missing proteins 8. Providing important proteomic foundations/standards/metrics/guidelines/rules for confident protein identification (PSI, MRM atlas etc), phosphorylation, missing protein identification (C-HPP), analysis of antibody data (human protein atlas), etc 9. Getting the proteomics community together 10. Proteomic community building. Specifically, providing an inclusive supportive environment the development of scientific groups focused on specific disease or organelle. This is homegrown groups who outreach proteomics into their areas of interest. 11. Support early career scientists and clinical scientists 12. Provide critical white papers/tools for application of proteomics in human disease 13. Establish a reputation as a global leader in uptake/validation of proteomics data in precision medicine 14. For utilising state of the art technologies to facilitate research that improves the lives of people throughout the full age spectrum, through a. Collaboration with industry to develop user-friendly and easily adaptable platforms for clinical use b. Collaboration by researchers to understand physiological development of human proteome c. To develop technologies to monitor wellness d. To solve mechanisms for diseases that affect large proportion and small number of people e. To utilise proteomics to design personalised treatment options for disease f. To utilise proteomics in daily diagnostics

17

Current Achievements of HUPO’s Human Proteome Project (HPP)

1. The HPP commitment to high quality science in proteomics experiments and data analysis. This includes the development and wide utilization of the HPP Guidelines for Interpretation of Mass Spectrometry Data (v2.0, JPR 2016) and the encouragement of generations of improvements in the underlying technology platforms. 2. Providing a structure and foundation for mapping the human proteome via diverse technologies, approaches and applications 3. Putting forward the efforts of the proteomics community for the greater scientific picture by demonstrating the essential contribution of proteins to understanding the biology of life, health and disease. 4. The placement of neXtProt, PeptideAtlas, and Human Protein Atlas at the center of the proteomics world. 5. The creation of ProteomeXchange in conjunction with the European Bioinformatics Institute and PRIDE to register and make widely available the primary data and meta- data from all proteomics experiments, incorporated into the HPP Guidelines. 6. The organized effort to apply standardized reanalysis of all MS-based proteomics datasets at PeptideAtlas and curation of multiple kinds of protein studies at neXtProt to progressively complete the “protein parts list” for the human proteome. As of January 2018, 87% of predicted protein-coding genes (17,470 proteins) were rated as having protein evidence PE1 (of a total of 19,656 PE 1+2+3+4). This percentage continues to rise, with leadership from the Chromosome-centric C-HPP and the KB resource pillar of the HPP. Each year the HPP published the “Metrics paper” in the Journal of Proteome Research with progress from the entire community. 7. The development of the SRM Atlas to expedite targeted proteomics for quantitative biological studies throughout the life sciences by providing spectra from synthesized, predicted proteotypic peptides of nearly every predicted protein. 8. The use of bibliometric analyses to identify the most investigated proteins (“popular proteins”) by organ and disease category and connect those research communities to the SRM Atlas as a guide to quantitative targeted assays of those key proteins and their pathways and networks, led by the Biology and Disease-based B/D-HPP. 9. The tissue-specific and intracellular localization of expression of proteins with immunohistochemistry/ immunofluorescence, correlation with organ-specific transcript expression, and quality assurance of antibody specificity by Human Protein Atlas (Antibody-Profiling resource pillar of the HPP). 10. Continued effort to make proteomics an obligatory component of multi-omics research protocols, recognizing that crucial properties of proteins—dynamics of abundance, post-translational modifications, and splice isoform variants—cannot be predicted or detected at the gene or transcript levels. Complemented by the C-HPP-

18

led initiative to find evidence of function for the 1260 PE1 proteins lacking specific functional annotation. 11. Multiple efforts to engage, mentor, and highlight research from Early Career Researchers, the future of the field.

19