Webinar instructions

• GoToTraining works best in Chrome or Internet Explorer – avoid Firefox due to audio issues with Macs

• To access the full features of GoToTraining, use the desktop version by clicking on the flower symbol and selecting “switch to desktop version”

• All microphones will be muted whilst the trainer is speaking

• If you have a question please use the chat box at the bottom of the GoToTraining box Exploring function and sequence using UniProt

Hema Bye-A-Jee help@.org UniProt (Universal Protein Resource)

A comprehensive, high-quality and freely accessible resource of protein sequence and functional information:

• Detailed information on protein function, interactions, pathways etc. • Sequences, including isoforms, disease variants and PTMs • Stable identifiers (accessions)

European Institute Protein Information Resource (PIR), SIB Swiss Institute of Bioinformatics (SIB), (EMBL-EBI), Hinxton, Cambridge, UK Washington DC and Delaware, USA Geneva, Switzerland Species Coverage in UniProt

• Protein sequence and functional information for a large variety of species. • Enable cross-species comparison of orthology and conservation. • Annotation projects for key model organisms: • Work closely with databases • Communicate advances in research to the whole scientific community.

Resources we provide and their links The two datasets in UniProtKB

• expert-curated information on all aspects of a protein, e.g.: • function ~0.5 million • residue-specific data (e.g. active sites) • protein isoforms of one in one UniProtKB record (per species)

~100 million • protein isoforms of one gene in different UniProtKB records (per species) • mapped experimental sequence features (from protein 3D structures) • annotations from rule systems (incl. expert-curated rules) Exploring C.elegans protein entries in UniProtKB

The Caenorhabditis protein annotation project: Characterisation of C. elegans

http://ibgwww.colorado.edu/tj-lab/worms/N2wormb.jpg

Work closely with the worm research At present, there are over 4000 community and with WormBase manually reviewed C.elegans protein entries

Hompage: Search

Free text search bar

Default search is UniProtKB

All sections of UniProt, including specific data analysis tools such as may be accessed through quick access tabs below the main search bar on the main homepage Methods for searching UniProtKB

• Free text search queries: Methods for searching UniProtKB

• Advanced search options: Clicking on “Help” allows you to find your field of interest more quickly Search results

Reviewed/manually curated entries

Unreviewed entries Filtering search results

Filter by: Entry type (reviewed/unreviewed) Organism

Change view of the data set identified in search Selecting entries

Click to open entry

Tick box to select an entry or several entries to use tools and/or add to basket Customise displayed results

Click “Columns” to customise the information displayed Select information displayed in the search

Click “Save” to apply selection and display preferences Downloading specific information

Customising the results displayed allows for the very specific download of data. With customised data columns, downloading tab-separated data only downloads selected columns. This is a convenient way to create concise datasets for further analysis. Accessing an entry

Click on the accession number to open an entry Entry view

Quick access tabs list entry contents- click to jump to and/or (de)select to toggle content visibility Types of data in a UniProtKB entry

Functional Names comprehensive annotation standardized

Sequences Identifiers isoforms unique, stable variants etc

Links to Amino acid- specialized specific data databases

Scientific literature Sequence analysis tools External databases Data sources and referencing

Names Functional comprehensive annotation standardized

Sequences Identifiers isoforms unique, stable variants etc

Links to Amino acid- specialized specific data databases

Qualifiers at the end of sections and comments denote the data source and provide links to supporting data and additional information Publications and referencing

Publications referenced throughout entry using their PubMed identifiers

List of publications accessed through quick access tab Publications viewer An expanded publications view incorporating filters and access to mapped publications.

Indicate whether the entry has been manually reviewed Hyperlinks to the article in PubMed and Europe PMC

Hyperlink to a list of entries that also cite article

Indicate the information/ data obtained from article and included in entry Sequences: Using all isoforms

Describes the number of Aligns ALL sequences Adds ALL sequences isoforms and how produced to basket Sequences: Using individual isoforms

Name of isoform Provide details of EACH isoform

Tools can be applied to EACH isoform

Sequences of EACH isoform can be downloaded and added to basket Feature table: Amino acid specific view- tabular Amino acid specific data is listed by type of modification

List amino acid specific data and drop-down list of relevant publications Feature viewer: Amino acid specific view- graphical Disease Vs Disruption Phenotype Disruption Phenotype: Describes null mutants and knockdown as a result of RNAi

Phenotypes as a result of single/ double/multiple mutations

Phenotypes as a result of RNAi, indicating developmental stage if applicable Mutagenesis: Describes mutations that disrupt one or multiple amino acids

Provide name of mutant if known

Indicate if phenotype is due to mutations of several amino acids Cross references

Provide links to over 150 external databases such as nucleotide sequence databases, model organism databases and genomics and proteomics resources.

Open reading frame (ORF) identifier

Protein sequence identifier in Wormbase Wormbase entry accession Gene name

Gene name and open reading frame identifiers in line with Wormbase Accessing UniProt Data

• UniProt Releases every 4 weeks and is freely available - http://www.uniprot.org/downloads

• Provided in a range of formats - text, XML, XML/RDF, FASTA, GFF, tab-delimited

• Web site - Supports simple and complex queries - RESTful API - Documentation and Details at: http://uniprot.org/help/programmatic_access

Useful links Contact us: [email protected]

• Online training: https://www.ebi.ac.uk/training/events/2018/exploring-protein-function-and-sequences- using-uniprot https://www.ebi.ac.uk/training/online/course/enzymes-uniprot https://www.ebi.ac.uk/training/online/course/exploring-models-human-disease-uniprot https://www.ebi.ac.uk/training/online/course/uniprot-peptide-search-and-website-updates- webinar https://www.ebi.ac.uk/training/online/course/uniprot-exploring-protein-sequence-and- functional https://www.ebi.ac.uk/training/online/course/uniprot-exploring-protein-sequence-and- functional-0 https://www.ebi.ac.uk/training/online/course/uniprot-programmatic-access-uniprotkb- webinar https://www.ebi.ac.uk/training/online/course/uniprot-quick-tourversion-0

UniProt funding

UniProt is supported by the National Institutes of Health (NIH), National Research Institute (NHGRI) and National Institute of General Medical Sciences (NIGMS) grant U41HG007822.

Additional support for the EMBL-EBI's involvement in UniProt comes from European Molecular Biology Laboratory (EMBL), the British Heart Foundation (BHF) (RG/13/5/30112), the Parkinson's Disease United Kingdom (PDUK) GO grant G-1307, and the NIH GO grant U41HG02273.

UniProt activities at the SIB are additionally supported by the Swiss Federal Government through the State Secretariat for Education, Research and Innovation SERI.

PIR's UniProt activities are also supported by the NIH grants R01GM080646, G08LM010720, and P20GM103446, and the National Science Foundation (NSF) grant DBI-1062520. UniProt Team

PIs: Alex Bateman, Cathy Wu, Alan Bridge

Key staff: Cecilia Arighi (Curation), Darren Natale (Content), Hongzhan Huang (Development), Manuela Pruess (Coordination), Maria Martin (Development), Michele Magrane (Curation), Nicole Redaschi (Development), Peter McGarvey (Content), Sandra Orchard (Content), Shriya Raj (Coordination), Sylvain Poux (Curation)

Content/Curation: Aleksandra Shypitsyna, Alistair MacDougall, Andre Stutz, Andrea Auchincloss, Anne Estreicher, Anne Morgat, Arnaud Gos, C. R. Vinayaka, Catherine Rivoire, Chantal Hulo, Christian Sigrist, Cristina Casals Casas, Damien Lieberherr, Elena Speretta, Elisabeth Coudert, Emma Hatton-Ellis, Emmanuel Boutet, Florence Jungo, George Georghiou, Ghislaine Argoud-Puy, Guillaume Keller, Hema Bye-A-Jee, Ivo Pedruzzi, John S. Garavelli, Karen Ross, Kate Warner, Kati Laiho, Klemens Pichler, Kristian Axelsen, Lai-Su Yeh, Lionel Breuza, Livia Famiglietti, Lucila Aimo, Marc Feuermann, Michael Tognolli, Nadine Gruaz, Nevila Hyka-Nouspikel, Nidhi Tyagi, Patrick Masson, Penelope Garmiri, Philippe Lemercier, Qinghua Wang, Ramona Britto, Rossana Zaru, Sandrine Pilbout, Shyamala Sundaram, Ursula Hinz, Yvonne Lussi

Development: Alan Da Silva, Alexandr Ignatchenko, Andrew Nightingale, Arnaud Kerhornou, Beatrice Cuche, Benoit Bely, Borisas Bursteinas, Chuming Chen, Delphine Baratin, Dushyanth Jyothi, Edouard De Castro, Edward Turner, Elisabeth Gasteiger, Emanuele Alpi, Guoying Qi, Hermann Zellner, Jerven Bolleman, Jian Zhang, Jie Luo, Joseph Onwubiko, Leonardo Gonzales, Leslie Arminski, Leyla Garcia Castro, Mark Bingley, Monica Pozzato, Parit Bansal, Rabie Saidi, Sangya Pundir, Sebastien Gehant, Teresa Batista Neto, Thierry Lombardot, Tony Sawford, Tony Wardell, Tunca Dogan, Vicente Lara, Vishal Joshi, Vladimir Volynkin, Wudong Liu, Xavier Martin, Xavier Watkins, Ying Yan, Yongxing Chen, Yuqi Wang

European Bioinformatics Institute (EMBL- Protein Information Resource (PIR), SIB Swiss Institute of Bioinformatics (SIB), EBI), Hinxton, Cambridge, UK Washington DC and Delaware, USA Geneva, Switzerland Upcoming webinars See the full list of upcoming webinars at https://www.ebi.ac.uk/training/webinars

Don’t forget! Please fill in the survey that launches after the webinar – thanks!