Webinar instructions
• GoToTraining works best in Chrome or Internet Explorer – avoid Firefox due to audio issues with Macs
• To access the full features of GoToTraining, use the desktop version by clicking on the flower symbol and selecting “switch to desktop version”
• All microphones will be muted whilst the trainer is speaking
• If you have a question please use the chat box at the bottom of the GoToTraining box Exploring protein function and sequence using UniProt
Hema Bye-A-Jee help@uniprot.org UniProt (Universal Protein Resource)
A comprehensive, high-quality and freely accessible resource of protein sequence and functional information:
• Detailed information on protein function, interactions, pathways etc. • Sequences, including isoforms, disease variants and PTMs • Stable identifiers (accessions)
European Bioinformatics Institute Protein Information Resource (PIR), SIB Swiss Institute of Bioinformatics (SIB), (EMBL-EBI), Hinxton, Cambridge, UK Washington DC and Delaware, USA Geneva, Switzerland Species Coverage in UniProt
• Protein sequence and functional information for a large variety of species. • Enable cross-species comparison of orthology and conservation. • Annotation projects for key model organisms: • Work closely with model organism databases • Communicate advances in research to the whole scientific community.
Resources we provide and their links The two datasets in UniProtKB
• expert-curated information on all aspects of a protein, e.g.: • function ~0.5 million • residue-specific data (e.g. active sites) • protein isoforms of one gene in one UniProtKB record (per species)
~100 million • protein isoforms of one gene in different UniProtKB records (per species) • mapped experimental sequence features (from protein 3D structures) • annotations from rule systems (incl. expert-curated rules) Exploring C.elegans protein entries in UniProtKB
The Caenorhabditis protein annotation project: Characterisation of C. elegans proteins
http://ibgwww.colorado.edu/tj-lab/worms/N2wormb.jpg
Work closely with the worm research At present, there are over 4000 community and with WormBase manually reviewed C.elegans protein entries
Hompage: Search
Free text search bar
Default search is UniProtKB
All sections of UniProt, including specific data analysis tools such as blast may be accessed through quick access tabs below the main search bar on the main homepage Methods for searching UniProtKB
• Free text search queries: Methods for searching UniProtKB
• Advanced search options: Clicking on “Help” allows you to find your field of interest more quickly Search results
Reviewed/manually curated entries
Unreviewed entries Filtering search results
Filter by: Entry type (reviewed/unreviewed) Organism
Change view of the data set identified in search Selecting entries
Click to open entry
Tick box to select an entry or several entries to use tools and/or add to basket Customise displayed results
Click “Columns” to customise the information displayed Select information displayed in the search
Click “Save” to apply selection and display preferences Downloading specific information
Customising the results displayed allows for the very specific download of data. With customised data columns, downloading tab-separated data only downloads selected columns. This is a convenient way to create concise datasets for further analysis. Accessing an entry
Click on the accession number to open an entry Entry view
Quick access tabs list entry contents- click to jump to and/or (de)select to toggle content visibility Types of data in a UniProtKB entry
Functional Names comprehensive annotation standardized
Sequences Identifiers isoforms unique, stable variants etc
Links to Amino acid- specialized specific data databases
Scientific literature Sequence analysis tools External databases Data sources and referencing
Names Functional comprehensive annotation standardized
Sequences Identifiers isoforms unique, stable variants etc
Links to Amino acid- specialized specific data databases
Qualifiers at the end of sections and comments denote the data source and provide links to supporting data and additional information Publications and referencing
Publications referenced throughout entry using their PubMed identifiers
List of publications accessed through quick access tab Publications viewer An expanded publications view incorporating filters and access to mapped publications.
Indicate whether the entry has been manually reviewed Hyperlinks to the article in PubMed and Europe PMC
Hyperlink to a list of entries that also cite article
Indicate the information/ data obtained from article and included in entry Sequences: Using all isoforms
Describes the number of Aligns ALL sequences Adds ALL sequences isoforms and how produced to basket Sequences: Using individual isoforms
Name of isoform Provide details of EACH isoform
Tools can be applied to EACH isoform
Sequences of EACH isoform can be downloaded and added to basket Feature table: Amino acid specific view- tabular Amino acid specific data is listed by type of modification
List amino acid specific data and drop-down list of relevant publications Feature viewer: Amino acid specific view- graphical Disease Vs Disruption Phenotype Disruption Phenotype: Describes null mutants and knockdown as a result of RNAi
Phenotypes as a result of single/ double/multiple mutations
Phenotypes as a result of RNAi, indicating developmental stage if applicable Mutagenesis: Describes mutations that disrupt one or multiple amino acids
Provide name of mutant if known
Indicate if phenotype is due to mutations of several amino acids Cross references
Provide links to over 150 external databases such as nucleotide sequence databases, model organism databases and genomics and proteomics resources.
Open reading frame (ORF) identifier
Protein sequence identifier in Wormbase Wormbase entry accession Gene name
Gene name and open reading frame identifiers in line with Wormbase Accessing UniProt Data
• UniProt Releases every 4 weeks and is freely available - http://www.uniprot.org/downloads
• Provided in a range of formats - text, XML, XML/RDF, FASTA, GFF, tab-delimited
• Web site - Supports simple and complex queries - RESTful API - Documentation and Details at: http://uniprot.org/help/programmatic_access
Useful links Contact us: [email protected]
• Online training: https://www.ebi.ac.uk/training/events/2018/exploring-protein-function-and-sequences- using-uniprot https://www.ebi.ac.uk/training/online/course/enzymes-uniprot https://www.ebi.ac.uk/training/online/course/exploring-models-human-disease-uniprot https://www.ebi.ac.uk/training/online/course/uniprot-peptide-search-and-website-updates- webinar https://www.ebi.ac.uk/training/online/course/uniprot-exploring-protein-sequence-and- functional https://www.ebi.ac.uk/training/online/course/uniprot-exploring-protein-sequence-and- functional-0 https://www.ebi.ac.uk/training/online/course/uniprot-programmatic-access-uniprotkb- webinar https://www.ebi.ac.uk/training/online/course/uniprot-quick-tourversion-0
UniProt funding
UniProt is supported by the National Institutes of Health (NIH), National Human Genome Research Institute (NHGRI) and National Institute of General Medical Sciences (NIGMS) grant U41HG007822.
Additional support for the EMBL-EBI's involvement in UniProt comes from European Molecular Biology Laboratory (EMBL), the British Heart Foundation (BHF) (RG/13/5/30112), the Parkinson's Disease United Kingdom (PDUK) GO grant G-1307, and the NIH GO grant U41HG02273.
UniProt activities at the SIB are additionally supported by the Swiss Federal Government through the State Secretariat for Education, Research and Innovation SERI.
PIR's UniProt activities are also supported by the NIH grants R01GM080646, G08LM010720, and P20GM103446, and the National Science Foundation (NSF) grant DBI-1062520. UniProt Team
PIs: Alex Bateman, Cathy Wu, Alan Bridge
Key staff: Cecilia Arighi (Curation), Darren Natale (Content), Hongzhan Huang (Development), Manuela Pruess (Coordination), Maria Martin (Development), Michele Magrane (Curation), Nicole Redaschi (Development), Peter McGarvey (Content), Sandra Orchard (Content), Shriya Raj (Coordination), Sylvain Poux (Curation)
Content/Curation: Aleksandra Shypitsyna, Alistair MacDougall, Andre Stutz, Andrea Auchincloss, Anne Estreicher, Anne Morgat, Arnaud Gos, C. R. Vinayaka, Catherine Rivoire, Chantal Hulo, Christian Sigrist, Cristina Casals Casas, Damien Lieberherr, Elena Speretta, Elisabeth Coudert, Emma Hatton-Ellis, Emmanuel Boutet, Florence Jungo, George Georghiou, Ghislaine Argoud-Puy, Guillaume Keller, Hema Bye-A-Jee, Ivo Pedruzzi, John S. Garavelli, Karen Ross, Kate Warner, Kati Laiho, Klemens Pichler, Kristian Axelsen, Lai-Su Yeh, Lionel Breuza, Livia Famiglietti, Lucila Aimo, Marc Feuermann, Michael Tognolli, Nadine Gruaz, Nevila Hyka-Nouspikel, Nidhi Tyagi, Patrick Masson, Penelope Garmiri, Philippe Lemercier, Qinghua Wang, Ramona Britto, Rossana Zaru, Sandrine Pilbout, Shyamala Sundaram, Ursula Hinz, Yvonne Lussi
Development: Alan Da Silva, Alexandr Ignatchenko, Andrew Nightingale, Arnaud Kerhornou, Beatrice Cuche, Benoit Bely, Borisas Bursteinas, Chuming Chen, Delphine Baratin, Dushyanth Jyothi, Edouard De Castro, Edward Turner, Elisabeth Gasteiger, Emanuele Alpi, Guoying Qi, Hermann Zellner, Jerven Bolleman, Jian Zhang, Jie Luo, Joseph Onwubiko, Leonardo Gonzales, Leslie Arminski, Leyla Garcia Castro, Mark Bingley, Monica Pozzato, Parit Bansal, Rabie Saidi, Sangya Pundir, Sebastien Gehant, Teresa Batista Neto, Thierry Lombardot, Tony Sawford, Tony Wardell, Tunca Dogan, Vicente Lara, Vishal Joshi, Vladimir Volynkin, Wudong Liu, Xavier Martin, Xavier Watkins, Ying Yan, Yongxing Chen, Yuqi Wang
European Bioinformatics Institute (EMBL- Protein Information Resource (PIR), SIB Swiss Institute of Bioinformatics (SIB), EBI), Hinxton, Cambridge, UK Washington DC and Delaware, USA Geneva, Switzerland Upcoming webinars See the full list of upcoming webinars at https://www.ebi.ac.uk/training/webinars
Don’t forget! Please fill in the survey that launches after the webinar – thanks!