Webinar Instructions

Webinar instructions • GoToTraining works best in Chrome or Internet Explorer – avoid Firefox due to audio issues with Macs • To access the full features of GoToTraining, use the desktop version by clicking on the flower symbol and selecting “switch to desktop version” • All microphones will be muted whilst the trainer is speaking • If you have a question please use the chat box at the bottom of the GoToTraining box Exploring protein function and sequence using UniProt Hema Bye-A-Jee [email protected] UniProt (Universal Protein Resource) A comprehensive, high-quality and freely accessible resource of protein sequence and functional information: • Detailed information on protein function, interactions, pathways etc. • Sequences, including isoforms, disease variants and PTMs • Stable identifiers (accessions) European Bioinformatics Institute Protein Information Resource (PIR), SIB Swiss Institute of Bioinformatics (SIB), (EMBL-EBI), Hinxton, Cambridge, UK Washington DC and Delaware, USA Geneva, Switzerland Species Coverage in UniProt • Protein sequence and functional information for a large variety of species. • Enable cross-species comparison of orthology and conservation. • Annotation projects for key model organisms: • Work closely with model organism databases • Communicate advances in research to the whole scientific community. Resources we provide and their links The two datasets in UniProtKB • expert-curated information on all aspects of a protein, e.g.: • function ~0.5 million • residue-specific data (e.g. active sites) • protein isoforms of one gene in one UniProtKB record (per species) ~100 million • protein isoforms of one gene in different UniProtKB records (per species) • mapped experimental sequence features (from protein 3D structures) • annotations from rule systems (incl. expert-curated rules) Exploring C.elegans protein entries in UniProtKB The Caenorhabditis protein annotation project: Characterisation of C. elegans proteins http://ibgwww.colorado.edu/tj-lab/worms/N2wormb.jpg Work closely with the worm research At present, there are over 4000 community and with WormBase manually reviewed C.elegans protein entries Hompage: Search Free text search bar Default search is UniProtKB All sections of UniProt, including specific data analysis tools such as blast may be accessed through quick access tabs below the main search bar on the main homepage Methods for searching UniProtKB • Free text search queries: Methods for searching UniProtKB • Advanced search options: Clicking on “Help” allows you to find your field of interest more quickly Search results Reviewed/manually curated entries Unreviewed entries Filtering search results Filter by: Entry type (reviewed/unreviewed) Organism Change view of the data set identified in search Selecting entries Click to open entry Tick box to select an entry or several entries to use tools and/or add to basket Customise displayed results Click “Columns” to customise the information displayed Select information displayed in the search Click “Save” to apply selection and display preferences Downloading specific information Customising the results displayed allows for the very specific download of data. With customised data columns, downloading tab-separated data only downloads selected columns. This is a convenient way to create concise datasets for further analysis. Accessing an entry Click on the accession number to open an entry Entry view Quick access tabs list entry contents- click to jump to and/or (de)select to toggle content visibility Types of data in a UniProtKB entry Functional Names comprehensive annotation standardized Sequences Identifiers isoforms unique, stable variants etc Links to Amino acid- specialized specific data databases Scientific literature Sequence analysis tools External databases Data sources and referencing Names Functional comprehensive annotation standardized Sequences Identifiers isoforms unique, stable variants etc Links to Amino acid- specialized specific data databases Qualifiers at the end of sections and comments denote the data source and provide links to supporting data and additional information Publications and referencing Publications referenced throughout entry using their PubMed identifiers List of publications accessed through quick access tab Publications viewer An expanded publications view incorporating filters and access to mapped publications. Indicate whether the entry has been manually reviewed Hyperlinks to the article in PubMed and Europe PMC Hyperlink to a list of entries that also cite article Indicate the information/ data obtained from article and included in entry Sequences: Using all isoforms Describes the number of Aligns ALL sequences Adds ALL sequences isoforms and how produced to basket Sequences: Using individual isoforms Name of isoform Provide details of EACH isoform Tools can be applied to EACH isoform Sequences of EACH isoform can be downloaded and added to basket Feature table: Amino acid specific view- tabular Amino acid specific data is listed by type of modification List amino acid specific data and drop-down list of relevant publications Feature viewer: Amino acid specific view- graphical Disease Vs Disruption Phenotype Disruption Phenotype: Describes null mutants and knockdown as a result of RNAi Phenotypes as a result of single/ double/multiple mutations Phenotypes as a result of RNAi, indicating developmental stage if applicable Mutagenesis: Describes mutations that disrupt one or multiple amino acids Provide name of mutant if known Indicate if phenotype is due to mutations of several amino acids Cross references Provide links to over 150 external databases such as nucleotide sequence databases, model organism databases and genomics and proteomics resources. Open reading frame (ORF) identifier Protein sequence identifier in Wormbase Wormbase entry accession Gene name Gene name and open reading frame identifiers in line with Wormbase Accessing UniProt Data • UniProt Releases every 4 weeks and is freely available - http://www.uniprot.org/downloads • Provided in a range of formats - text, XML, XML/RDF, FASTA, GFF, tab-delimited • Web site - Supports simple and complex queries - RESTful API - Documentation and Details at: http://uniprot.org/help/programmatic_access Useful links Contact us: [email protected] • Online training: https://www.ebi.ac.uk/training/events/2018/exploring-protein-function-and-sequences- using-uniprot https://www.ebi.ac.uk/training/online/course/enzymes-uniprot https://www.ebi.ac.uk/training/online/course/exploring-models-human-disease-uniprot https://www.ebi.ac.uk/training/online/course/uniprot-peptide-search-and-website-updates- webinar https://www.ebi.ac.uk/training/online/course/uniprot-exploring-protein-sequence-and- functional https://www.ebi.ac.uk/training/online/course/uniprot-exploring-protein-sequence-and- functional-0 https://www.ebi.ac.uk/training/online/course/uniprot-programmatic-access-uniprotkb- webinar https://www.ebi.ac.uk/training/online/course/uniprot-quick-tourversion-0 UniProt funding UniProt is supported by the National Institutes of Health (NIH), National Human Genome Research Institute (NHGRI) and National Institute of General Medical Sciences (NIGMS) grant U41HG007822. Additional support for the EMBL-EBI's involvement in UniProt comes from European Molecular Biology Laboratory (EMBL), the British Heart Foundation (BHF) (RG/13/5/30112), the Parkinson's Disease United Kingdom (PDUK) GO grant G-1307, and the NIH GO grant U41HG02273. UniProt activities at the SIB are additionally supported by the Swiss Federal Government through the State Secretariat for Education, Research and Innovation SERI. PIR's UniProt activities are also supported by the NIH grants R01GM080646, G08LM010720, and P20GM103446, and the National Science Foundation (NSF) grant DBI-1062520. UniProt Team PIs: Alex Bateman, Cathy Wu, Alan Bridge Key staff: Cecilia Arighi (Curation), Darren Natale (Content), Hongzhan Huang (Development), Manuela Pruess (Coordination), Maria Martin (Development), Michele Magrane (Curation), Nicole Redaschi (Development), Peter McGarvey (Content), Sandra Orchard (Content), Shriya Raj (Coordination), Sylvain Poux (Curation) Content/Curation: Aleksandra Shypitsyna, Alistair MacDougall, Andre Stutz, Andrea Auchincloss, Anne Estreicher, Anne Morgat, Arnaud Gos, C. R. Vinayaka, Catherine Rivoire, Chantal Hulo, Christian Sigrist, Cristina Casals Casas, Damien Lieberherr, Elena Speretta, Elisabeth Coudert, Emma Hatton-Ellis, Emmanuel Boutet, Florence Jungo, George Georghiou, Ghislaine Argoud-Puy, Guillaume Keller, Hema Bye-A-Jee, Ivo Pedruzzi, John S. Garavelli, Karen Ross, Kate Warner, Kati Laiho, Klemens Pichler, Kristian Axelsen, Lai-Su Yeh, Lionel Breuza, Livia Famiglietti, Lucila Aimo, Marc Feuermann, Michael Tognolli, Nadine Gruaz, Nevila Hyka-Nouspikel, Nidhi Tyagi, Patrick Masson, Penelope Garmiri, Philippe Lemercier, Qinghua Wang, Ramona Britto, Rossana Zaru, Sandrine Pilbout, Shyamala Sundaram, Ursula Hinz, Yvonne Lussi Development: Alan Da Silva, Alexandr Ignatchenko, Andrew Nightingale, Arnaud Kerhornou, Beatrice Cuche, Benoit Bely, Borisas Bursteinas, Chuming Chen, Delphine Baratin, Dushyanth Jyothi, Edouard De Castro, Edward Turner, Elisabeth Gasteiger, Emanuele Alpi, Guoying Qi, Hermann Zellner, Jerven Bolleman, Jian Zhang, Jie Luo, Joseph Onwubiko, Leonardo Gonzales, Leslie Arminski, Leyla Garcia Castro, Mark Bingley, Monica Pozzato, Parit Bansal, Rabie Saidi, Sangya Pundir, Sebastien Gehant, Teresa Batista Neto, Thierry Lombardot, Tony Sawford, Tony Wardell, Tunca Dogan, Vicente Lara, Vishal Joshi, Vladimir Volynkin, Wudong Liu, Xavier Martin, Xavier Watkins, Ying Yan, Yongxing Chen, Yuqi Wang European Bioinformatics Institute (EMBL- Protein Information Resource (PIR), SIB Swiss Institute of Bioinformatics (SIB), EBI), Hinxton, Cambridge, UK Washington DC and Delaware, USA Geneva, Switzerland Upcoming webinars See the full list of upcoming webinars at https://www.ebi.ac.uk/training/webinars Don’t forget! Please fill in the survey that launches after the webinar – thanks! .

Webinar Instructions

Original Article Text Mining in the Biocuration Workflow: Applications for Literature Curation at Wormbase, Dictybase and TAIR

Mouse Germ Line Mutations Due to Retrotransposon Insertions Liane Gagnier1, Victoria P

The ELIXIR Core Data Resources: Fundamental Infrastructure for The

UC Davis UC Davis Previously Published Works

Mapping Our Genes—Genome Projects: How Big? How Fast?

The Biogrid Interaction Database

NIH-GDS: Genomic Data Sharing

SGD and the Alliance of Genome Resources Stacia R

PINOT: an Intuitive Resource for Integrating Protein-Protein Interactions James E

MUTED Antibody - C-Terminal Region (ARP63301 P050) Data Sheet

Annotation of Metabolic Genes in Caenorhabditis Elegans and Reconstruction of Icel1273

Product Information