Contributing to the Uniprot Knowledgebase - How You Can Help

Contributing to the UniProt Knowledgebase - how you can help Cecilia Arighi, PhD PIR team lead UniProt [email protected] 1 Outline v UniProt Overview v UniProt curated and additional publications v Community contribution: Why? What is the benefit? v Your personal researcher identifier v Publication submission and review processes v What happens after submission v Examples and demo 2 The Universal Protein Resource www.uniprot.org Comprehensive, high-quality and freely accessible resource of protein sequence and functional information 3 Resources we provide >500K unique visitors per month BLAST SEQUENCE ID MAPPING PEPTIDE Tools: ALIGNMENT SEARCH PROGRAMATIC DOWNLOADS ACCESS https://www.uniprot.org 4 The knowledgebase UniProtKB • high quality expert curation • non-redundant (1 entry/gene/species) • cross-references in every Reviewed entry (Swiss-Prot) • automatic annotation • sequence redundancy allowed • computationally generated Unreviewed • cross-references in every (TrEMBL) entry Release 2020_02 5 Types of data in a UniProtKB entry Functional Names comprehensive annotation standardized Sequences Identifiers isoforms unique, staBle variants, etc Links to Amino acid- specialized speciFic data databases 6 Literature Annotations Expert UniProtKB organized in topics in curation curation Entry Entry view CLDN1 - Claudin-1 - Homo sapiens (Human) - CLDN1 gene & protein http://www.uniprot.org/uniprot/O95832 UniProtKB - O95832 (CLD1_HUMAN) Protein Claudin-1 Gene CLDN1 Organism Homo sapiens (Human) Status s Reviewed - Annotation score: - Experimental evidence at protein level Function Claudins function as major constituents of the tight junction complexes that regulate the permeability of epithelia. While some claudin family members play essential roles in the formation of impermeable barriers, others mediate the permeability to ions and small molecules. Often, several claudin family members are coexpressed and interact with each other, and this determines the overall permeability. CLDN1 is required to prevent the paracellular diffusion of small molecules through tight junctions in the epidermis and is required for the normal barrier function of the skin. Required for normal water homeostasis and to prevent excessive water loss through the skin, probably via an indirect effect on the expression levels of other proteins, since CLDN1 itself seems to be dispensable for water barrier formation in keratinocyte tight junctions (PubMed:23407391). Evidence: 1 Publication (Microbial infection) Acts as a receptor for hepatitis C virus in hepatocytes (PubMed:17325668). Acts as a receptor for dengue virus (PubMed:24074594). Evidence: 2 Publications GO - Molecular function identical protein binding Evidence: Source: UniProtKB structural molecule activity Evidence: Source: InterPro virus receptor activity Evidence: Source: UniProtKB-KW GO - Biological process aging Evidence: Source: Ensembl bicellular tight junction assembly Evidence: Source: UniProtKB calcium-independent cell-cell adhesion via plasma membrane cell-adhesion molecules Evidence: Source: UniProtKB cell-cell junction organization Evidence: Source: MGI cellular response to butyrate Evidence: Source: Ensembl Updates cellular response to interferon-gamma Evidence: Source: Ensembl cellular response to lead ion Evidence: Source: Ensembl cellular response to transforming growth factor beta stimulus Evidence: Source: Ensembl cellular response to tumor necrosis factor Evidence: Source: Ensembl drug transport across blood-nerve barrier Evidence: Source: Ensembl Or establishment of blood-nerve barrier Evidence: Source: Ensembl 1 of 11 3/22/17, 9:08 AM New Entries Poux, Arighi, Magrane et al. 2017, Bioinformatics 33(21):3454, doi: 10.1093/bioinformatics/btx439 7 However…. v UniProt has a finite curation task force v Expert curation activity is prioritized, focusing on certain taxonomic groups or protein sets v The set of articles supporting annotations is a selection representing the landscape of knowledge aBout the protein at a given time (PMID:29036270) v Emerging critical topics, like COVID-19, with rapid accumulation of knowledge demanding up-to-date coverage 8 To expand access to published knowledge about a protein entry v Complement UniProt literature set with additional publications v Computationally mapped puBlications from external resources v Leverage community expertise for adding puBlications and information (annotations) v Classify publications into the entry annotation topics to improve navigation and discovery 9 Publication display in UniProt entry Sperm-associated antigen 5 UniProt Additional Filter by annotation topic https://www.uniprot.org/uniprot/Q96R06/publications 10 Publication display in UniProt entry Sperm-associated antigen 5 https://www.uniprot.org/uniprot/Q96R06/publications 11 ComputationallyNIA workshopmapped bibliography Sources • Sources of literature: MGI PhosphoSitePlus SGD iPTMnet • Curated & Text mining sources dictyBase PRO WormBase PDB Additional BiBliography for UniProt release 2020_02 PomBase pGenN covers (unique): TAIR PubTator 39,366,390 AC/PMID pairs FlyBase BioMuta 347,890 ACs ZFIN MEROPS 985,675 PMIDs RGD IntAct IC4R GeneRif BioCyc GAD Reactome Alzforum • Article categorization into different UniProt topics UPCLASS classification for UniProt release 2020_02 covers 37,893,926 AC/PMID pairs 345,617 ACs 954,619 PMIDs 12 Community: You as a contributor of literature and knowledge Why? v You have the expertise v You can help scale up curation v You asked BeneFits to you v Recognition for the papers and annotations contriButed v ContriBution citaBle and can Be used as a delivery of your research v Play an active role in improving the database v An improved database better supports the research community 13 Icon made by Flat Icons from www.flaticon.com Did you know about the different publication sources in UniProt (curated, from external resources, and community) prior to this webinar? 14 ORCID https://orcid.org/ v Unique digital identifier for researchers v You control puBlic data in your profile v Used as login mechanism to verify your identity v Used for giving you recognition for your contriBution 15 Icon made by Flat Icons from www.flaticon.com Do you have an ORCID ID? 16 System Overview 17 Snapshot of 1-Auto filled with data from the Submission Form entry 2-Checking puBlication exists and it has not been curated in UniProt 3-What aspects does the puBlication describe about the protein? 6-Does the publication show 4-Does publication provide name for any aspect that associates this gene/protein for the entry? protein with some disease? 5-Does the publication show some 7-Any other annotation? function about this protein? 9-Agree to show ORCID on 8-Your contact information weBsite for recognition (not to be shared) 18 Review Process v Ensure content is appropriate. Only facts related to the protein as described in the puBlication, not personal opinions v Minor edits to correct typos, grammar and for standardization purposes v Other content changes are done only with the suBmitter’s permission Track status of submissions: https://community.uniprot.org/BBsub/BBsubinfo.html 19 Status tag What does it mean? Who can view these? The suBmission can Be viewed in the Public entry puBlication section on the UniProt Everyone weBsite The suBmission has Been reviewed and Reviewed will show on UniProt entry page on Everyone weBsite in upcoming release The suBmission has not yet Been SuBmitter when Under Review reviewed By UniProt and it is not ready signed in with ORCID for release The suBmission has Been found SuBmitter when Dropped inappropriate (e.g., incorrect association signed in with ORCID of paper to entry) 20 https://community.uniprot.org/BBsub/BBsubinfo.html 21 ORCID as Source Attribution for your Work https://www.uniprot.org/uniprot/Q96R06/publications 22 Community Submission Statistics https://community.uniprot.org/BBsub/STATS.html 23 Linking a Publication to an Entry v You don’t need to Be author of the puBlication to contriBute v Important to match the protein that is descriBed in the publication with the correct species v If you are the author you know what species you worked on v If you are not the author, you can consider the following tips: v Check species info in materials and methods section of paper to do a search in UniProt with name and species v Does puBlication provide any type of identifiers for the proteins/genes (e.g., GenBank, PDB, etc)? v Is there any sequence that can Be compared to UniProt one? 24 “SeQuence analysis indicates that the cDNA is 3,843-nt long and encodes a protein of 1,193 aa with a predicted molecular mass of 134,400 Da (Fig. (Fig.2;2; accession no. AF399910).“ https://www.ncbi.nlm.nih.gov/pmc/articles/PMC64699/ v Use UniProt Retrieve/ID mapping to map external identifiers to UniProt https://www.uniprot.org/uploadlists/ 25 Publication has some seQuence that can be used to find entry TRPV1-long and short in vampire bat differ in C-terminal sequence TRPV1-S GIKRTLSFSLRSSRAV TRPV1-L GIKRTLSFSLRSSRVAGRNWKNFALVPLLRDASTRERQP Gracheva et al., Nature. 2011 Aug 3;476(7358):88-9. PMID: 21814281 GIKRTLSFSLRSSRAV v Use peptide search with GIKRTLSFSLRSSRVAGRNWKNFALVPLLRDASTRERQP subsequence to find correct entry https://www.uniprot.org/peptidesearch/ Desmodus rotundus (Vampire bat) TRPV1-L TRPV1-S 26 DEMO ON COMMUNITY SUBMISSION Go to https://community.uniprot.org/BBsub/doc/puBlic/Com munitysubmissionUniProtdemo_voice.mp4 27 https://covid-19.uniprot.org/uniprotkb?query=* 28 Future work in community submissions v SuBmission in Batch.

Contributing to the Uniprot Knowledgebase - How You Can Help

The ELIXIR Core Data Resources: Fundamental Infrastructure for The

What Remains to Be Discovered in the Eukaryotic Proteome?

NCBI Databases

The Biogrid Interaction Database

Biocuration 2016 - Posters

Annual Scientific Report 2011 Annual Scientific Report 2011 Designed and Produced by Pickeringhutchins Ltd

Uniprot.Ws: R Interface to Uniprot Web Services

Unexpected Insertion of Carrier DNA Sequences Into the Fission Yeast Genome During CRISPR–Cas9 Mediated Gene Deletion

Exploring the Development and Maintenance Practices in the Gene Ontology

Pombase: a Comprehensive Online Resource for Fission Yeast Valerie Wood1,2,3,*, Midori A

A Resource for RNA Subcellular Localizations

Pombase Anatomy of the Main Page