Protein Data Bank: an Open Access Resource Enabling Basic and Applied Research and Education in Biology and Medicine

Total Page:16

File Type:pdf, Size:1020Kb

Protein Data Bank: an Open Access Resource Enabling Basic and Applied Research and Education in Biology and Medicine Protein Data Bank: An open access resource enabling basic and applied research and education in biology and medicine John Westbrook, Ph.D. RCSB PDB Data & Software Architect Lead Overview . A bit of background about the PDB . PDB data content . PDB data representation and data quality standards . The PDB biocuration platform . Some key features delivered by the RCSB PDB 1 Protein Data Bank . First open access digital resource in biology (est. 1971 with 7 entries) . Single global archive of 3-D macromolecular structures (contains >120,000 entries) . Freely available to all at pdb.org . US PDB headquartered at Rutgers/UCSD (NSF, NIH, DOE) . US PDB part of Worldwide PDB with partners in EU and Japan Worldwide Protein Data Bank . Established in 2003, wwPDB ensures data are freely & globally available in a common repository . Collaborate on data quality and representation standards, and tools and procedures for biocuration . Each partner delivers different services and views of the common repository of data wwPDB Advisory Committee Meeting, 2015 3 1970s 1980s 1990s 2000s 2010s Small enzymes, RNA DNA, Protein-DNA Ribosomes Large viruses complexes macromolecular machines Science X-ray diffraction, Synchrotron Electron microscopy, High throughput Hybrid methods diffractometers, radiation, computer fast computers, fast structural genomics, punched cards graphics, NMR detectors robots Technology PDB archive IUCr deposition Standardization, Experimental data Validation standards established guidelines RCSB required, wwPDB Community 4 Diverse Molecular Content of the PDB 5 Technologies Experimental Evolving Rapidly Supporting First First 500 1991 First First 500 2012 X-ray Crystallography Nuclear Magne c Resonance Spectroscopy 90000 12000 80000 10000 e e v 70000 i v i h h c 8000 c r 60000 r 10 12 14 A A 0 2 4 6 8 n 50000 i n i 6000 s s e 40000 i e r i t r n 197530000 t 4000 n E E 20000 & Hybrid Methods Integrative 2000 198010000 0 0 5 0 5 0 5 0 5 0 5 0 5 0 5 0 5 0 5 8 8 9 9 0 0 1 1 7 8 8 9 9 0 0 1 1 9 9 9 9 0 0 0 0 9 9 9 9 9 0 0 0 0 1 1 1 1 2 2 2 2 1985 1 1 1 1 1 2 2 2 2 Year Year 1990 First First 500 1995 Year 3D Electron Microscopy 19951000 800 e 2000 v i h c r 600 A n i 2005 s e i 400 r t n E 2010200 0 2015 5 0 5 0 5 0 5 0 5 7 8 8 9 9 0 0 1 1 9 9 9 9 9 0 0 0 0 1 1 1 1 1 2 2 2 2 Year 6 PDB Growth and Data Usage PDB Depositors >800 new entries/month Growth in PDB Depositions 14000 Total Number of Annual Depositions 12000 Projected Annual Depositions 10000 8000 6000 # of Entries # 4000 PDB Users 2000 FTP and RSYNC Download Traffic in 2015: 526 million downloads 0 2011 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2012 2013 2014 2015 2016 2017 2018 Year As of 2015, ~55% increase in the number of global depositions since 2008 RCSB PDB PDBe PDBj 347 million 100 million 58 million 7 PDB Data Content . Atomic coordinates and primary experimental data . Sample composition and preparation details . Protein and nucleic acid polymer sequences and taxonomy . Small molecules (ligands) . Experimental data collection, structure solution, and structure refinement . Structure classification by sequence, function and other criteria . Citation and references to related data resources . PDB entries contain 250 – 1200+ unique items of data The 3.8 angstrom resolution cryo-EM structure of Zika virus. Sirohi, D., Chen, Z., Sun, L., Klose, T., Pierson, T.C., Rossmann, M.G., Kuhn, R.J. (2016) Science 352: 467-470 8 Community Data Standards . PDB manages data using the macromolecular extension to the Crystallographic Information Framework (mmCIF) originally developed as an IUCr data standard . PDB coordinates the extension of the standard PDBx Format Structure (PDBx/mmCIF) to support the broader needs of in the Lab Determination both contributors and users of the archive Round Trip wwPDB (> 4400 data terms) Deposition . A PDBx/mmCIF Working Group of community PDBx Format wwPDB Processing In PDB Archive and Annotation experts and methods developers oversees the evolution of the standard and ensures that the standard is well supported by key community software tools. Workshop Participants, . PDB hosts community workshops to support September 2011 the data standard and maintains a web site Workshop serving PDBx/mmCIF data dictionaries, schema Participants, October and software tools (mmcif.wwpdb.org) 2014 9 PDBx/mmCIF Development Timeline • 1991 • 1994 • 1997 • 2000 • 2003 • 2006 • 2009 • 2012 IUCr mmCIF Working Party IUCr mmCIF Maintenance Group Core CIF V1 mmCIF V1 mmCIF V2 mmCIF/Core sync’d Workshops Rutgers York CARB Honolulu Glasgow Tarrytown St. Louis Orlando Brussels Seattle Rutgers EBI DDL 1 DDL 2 mmCIF +Extensions PDB Exchange Dictionary wwPDB One Archive – One Dictionary wwPDB Common Deposition & Annotation Data mmCIF System 10 Community Standards for Data Quality Method-specific Community Validation Task Forces have been convened to collect recommendations and develop consensus on data quality standards, identify software tools to perform required validation tasks, and to define related content requirements for archiving. Meeting/ Task Force Chair(s)/Membership Outcomes Workshop X-ray 2008 Randy Read (Univ of Cambridge) (2011) Structure Validation 2015 17 members 19: 1395-1412 Task Force NMR 2009, 2011, Gaetano Montelione (Rutgers) (2013) Structure Validation 2013 (x2), Michael Nilges (Institut Pasteur) 21: 1563-1570 Task Force 2015 10 members 2016 3DEM 2010 Richard Henderson (MRC-LMB) (2012) Structure Validation Andrej Sali (UCSF) 20: 205-214 Task Force 21 members Small-Angle 2011 Jill Trewhella (Univ Sydney) (2013) Structure Scattering 2014 6 members 21: 875-881 Task Force Hybrid 2014 Andrej Sali (UCSF), Torsten (2015) Structure Methods Task Schwede (Univ Basel), Jill Trewhella 23: 1156-1167 Force (Univ Sydney) 27 members Presenting Data Quality to Diverse Audiences . Provide relative and absolute quality metrics in graphical Overall Quality format . Provide tabulations of key data, refinement statistics, and quality diagnostics . Assess all macromolecular and ligand structural components Residue Plots . PDF format reports can be uploaded with manuscript submission to a journal . Diagnostics also delivered as an XML format data files Grey – not modeled Green, yellow, orange, red – 0,1,2, 3 or more issues Red dot – poor fit to electron density 12 OneDep – The PDB Biocuration Platform PDB OneDep: a unified global deposition, PDB • Polymer check biocuration, and validation system • Ligand check 3. • Electron density fit Deposition deposit.wwpdb.org 4. 2. Pre-deposition Biocuration Validation ill validate.wwpdb.org OK 1 issue 2 issues 3+ issues 1. 5. Data Public Release Harvesting ftp://ftp.wwpdb.org wwpdb.org/deposition rcsb.org Data providers Data Users • Access data via web and ftp • Generate atomic coordinate and experimental download and web services data files • Enable other research for • Assemble mandatory data items for deposition user community 13 RCSB PDB Data Delivery Pipeline Deposition & Biocuration 14 RCSB PDB Web Portal . Launchpad for a wide range of functionalities o Deposit o Search o Analyze o Visualize o Tabulate o Download http://rcsb.org 15 15 RCSB PDB Mobile App . Provides convenient access to PDB data on the go with a minimal feature set . Provides a browser, simple search, and an interactive 3D viewer . Supports iPhone, iPad, and Android http://www.rcsb.org/pdb/static.do?p=mobile/RCSBapp.html 16 Web Services (RESTful APIs) . Programmatic access to data: application-to- application communication . Provides external workflows and analysis tools with direct access to a wide range of PDB data and services . Enables integration of PDB data and services with programs and scripts in a variety of computer languages and computing environments http://www.rcsb.org/pdb/software/rest.do 17 Enabling Data Access Through Integration and Visualization . Protein Feature View: mapping protein sequence to 3D structure . Gene View: mapping genome location to 3D structure . Visualization . Browser native visualization tools using an efficient data compression protocol . Small molecule electron density and binding interactions . Data integration resource files provided for download: . Correspondences with CCDC ligand structures . Sequence cluster data files . Phased release of data to support blinded molecular docking tests 18 Protein Sequence Integrated View 19 Genome Sequence Integrated View 20 Molecular Visualization Zn http://mmtf.rcsb.org/ https://github.com/arose/ngl 21 Visualizing Small Molecule Interactions 22 Reaching Diverse User Communities Who are our users? What are they using? Biologists: structural biology, biophysics, RCSB PDB website, deposition tools, data biochemistry, genetics, Immunology, pharmacology, cell and molecular biology … Other scientists: bioinformatics, software Web Services, search engines, data developers, … Students & teachers PDB-101 Media: Writers, textbook authors, patient advocacy Images, data, information, outreach material, groups, … e.g., posters General public: Curious/interested individuals, Images, Molecule of the Month, information artists, sculptors, … from external media 23 23 Online Educational Resources Resources to help understand biology at the molecular level http://pdb101.rcsb.org/ Animations Paper Models Posters 24 24 Molecule of the Month 25 PDB Management The Protein Data Bank PDB members past and present at the PDB40 Archive is managed by: Anniversary Symposium, 2011 wwpdb.org Members Worldwide Protein Data Bank rcsb.org pdbe.org pdbj.org bmrb.wisc.edu RCSB Protein Data Bank proteindatabank ja-jp.facebook.com/PDBjapan @buildmodels @PDBEurope @PDB_ja Funding: Funding: Funding: Funding: NSF, NIH, DOE EMBL-EBI, Wellcome Trust, NBDC-JST NLM BBSRC, NIGMS, EU 26 26.
Recommended publications
  • Impact of the Protein Data Bank Across Scientific Disciplines.Data Science Journal, 19: 25, Pp
    Feng, Z, et al. 2020. Impact of the Protein Data Bank Across Scientific Disciplines. Data Science Journal, 19: 25, pp. 1–14. DOI: https://doi.org/10.5334/dsj-2020-025 RESEARCH PAPER Impact of the Protein Data Bank Across Scientific Disciplines Zukang Feng1,2, Natalie Verdiguel3, Luigi Di Costanzo1,4, David S. Goodsell1,5, John D. Westbrook1,2, Stephen K. Burley1,2,6,7,8 and Christine Zardecki1,2 1 Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ, US 2 Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ, US 3 University of Central Florida, Orlando, Florida, US 4 Department of Agricultural Sciences, University of Naples Federico II, Portici, IT 5 Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, US 6 Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA, US 7 Rutgers Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ, US 8 Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, US Corresponding author: Christine Zardecki ([email protected]) The Protein Data Bank archive (PDB) was established in 1971 as the 1st open access digital data resource for biology and medicine. Today, the PDB contains >160,000 atomic-level, experimentally-determined 3D biomolecular structures. PDB data are freely and publicly available for download, without restrictions. Each entry contains summary information about the structure and experiment, atomic coordinates, and in most cases, a citation to a corresponding scien- tific publication.
    [Show full text]
  • Pdbefold Tutorial Tutorial Pdbefold Can May Be Accessed from Multiple Locations on the Pdbe Website
    PDBe TUTORIAL PDBeFold (SSM: Secondary Structure Matching) http://pdbe.org/fold/ This PDBe tutorial introduces PDBeFold, an interactive service for comparing protein structures in 3D. This service provides: . Pairwise and multiple comparison and 3D alignment of protein structures . Examination of a protein structure for similarity with the whole Protein Data Bank (PDB) archive or SCOP. Best C -alignment of compared structures . Download and visualisation of best-superposed structures using various graphical packages PDBeFold structure alignment is based on identification of residues occupying “equivalent” geometrical positions. In other words, unlike sequence alignment, residue type is neglected. The PDBeFold service is a very powerful structure alignment tool which can perform both pairwise and multiple three dimensional alignment. In addition to this there are various options by which the results of the structural alignment query can be sorted. The results of the Secondary Structure Matching can be sorted based on the Q score (Cα- alignment), P score (taking into account RMSD, number of aligned residues, number of gaps, number of matched Secondary Structure Elements and the SSE match score), Z score (based on Gaussian Statistics), RMSD and % Sequence Identity. It is hoped that at the end of this tutorial users will be able to use PDbeFold for the analysis of their own uploaded structures or entries already in the PDB archive. Protein Data Bank in Europe http://pdbe.org PDBeFOLD Tutorial Tutorial PDBeFold can may be accessed from multiple locations on the PDBe website. From the PDBe home page (http://pdbe.org/), there are two access points for the program as shown below.
    [Show full text]
  • EMBL-EBI-Overview.Pdf
    EMBL-EBI Overview EMBL-EBI Overview Welcome Welcome to the European Bioinformatics Institute (EMBL-EBI), a global hub for big data in biology. We promote scientific progress by providing freely available data to the life-science research community, and by conducting exceptional research in computational biology. At EMBL-EBI, we manage public life-science data on a very large scale, offering a rich resource of carefully curated information. We make our data, tools and infrastructure openly available to an increasingly data-driven scientific community, adjusting to the changing needs of our users, researchers, trainees and industry partners. This proactive approach allows us to deliver relevant, up-to-date data and tools to the millions of scientists who depend on our services. We are a founding member of ELIXIR, the European infrastructure for biological information, and are central to global efforts to exchange information, set standards, develop new methods and curate complex information. Our core databases are produced in collaboration with other world leaders including the National Center for Biotechnology Information in the US, the National Institute of Genetics in Japan, SIB Swiss Institute of Bioinformatics and the Wellcome Trust Sanger Institute in the UK. We are also a world leader in computational biology research, and are well integrated with experimental and computational groups on all EMBL sites. Our research programme is highly collaborative and interdisciplinary, regularly producing high-impact works on sequence and structural alignment, genome analysis, basic biological breakthroughs, algorithms and methods of widespread importance. EMBL-EBI is an international treaty organisation, and we serve the global scientific community.
    [Show full text]
  • Human Genetics 1990–2009
    Portfolio Review Human Genetics 1990–2009 June 2010 Acknowledgements The Wellcome Trust would like to thank the many people who generously gave up their time to participate in this review. The project was led by Liz Allen, Michael Dunn and Claire Vaughan. Key input and support was provided by Dave Carr, Kevin Dolby, Audrey Duncanson, Katherine Littler, Suzi Morris, Annie Sanderson and Jo Scott (landscaping analysis), and Lois Reynolds and Tilli Tansey (Wellcome Trust Expert Group). We also would like to thank David Lynn for his ongoing support to the review. The views expressed in this report are those of the Wellcome Trust project team – drawing on the evidence compiled during the review. We are indebted to the independent Expert Group, who were pivotal in providing the assessments of the Wellcome Trust’s role in supporting human genetics and have informed ‘our’ speculations for the future. Finally, we would like to thank Professor Francis Collins, who provided valuable input to the development of the timelines. The Wellcome Trust is a charity registered in England and Wales, no. 210183. Contents Acknowledgements 2 Overview and key findings 4 Landmarks in human genetics 6 1. Introduction and background 8 2. Human genetics research: the global research landscape 9 2.1 Human genetics publication output: 1989–2008 10 3. Looking back: the Wellcome Trust and human genetics 14 3.1 Building research capacity and infrastructure 14 3.1.1 Wellcome Trust Sanger Institute (WTSI) 15 3.1.2 Wellcome Trust Centre for Human Genetics 15 3.1.3 Collaborations, consortia and partnerships 16 3.1.4 Research resources and data 16 3.2 Advancing knowledge and making discoveries 17 3.3 Advancing knowledge and making discoveries: within the field of human genetics 18 3.4 Advancing knowledge and making discoveries: beyond the field of human genetics – ‘ripple’ effects 19 Case studies 22 4.
    [Show full text]
  • EC-PSI: Associating Enzyme Commission Numbers with Pfam Domains
    bioRxiv preprint doi: https://doi.org/10.1101/022343; this version posted July 10, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. EC-PSI: Associating Enzyme Commission Numbers with Pfam Domains Seyed Ziaeddin ALBORZI1;2, Marie-Dominique DEVIGNES3 and David W. RITCHIE2 1 Universite´ de Lorraine, LORIA, UMR 7503, Vandœuvre-les-Nancy,` F-54506, France 2 INRIA, Villers-les-Nancy,` F-54600, France 3 CNRS, LORIA, UMR 7503, Vandœuvre-les-Nancy,` F-54506, France Corresponding Author: [email protected] Abstract With the growing number of protein structures in the protein data bank (PDB), there is a need to annotate these structures at the domain level in order to relate protein structure to protein function. Thanks to the SIFTS database, many PDB chains are now cross-referenced with Pfam domains and enzyme commission (EC) numbers. However, these annotations do not include any explicit relationship between individual Pfam domains and EC numbers. This article presents a novel statistical training-based method called EC-PSI that can automatically infer high confi- dence associations between EC numbers and Pfam domains directly from EC-chain associations from SIFTS and from EC-sequence associations from the SwissProt, and TrEMBL databases. By collecting and integrating these existing EC-chain/sequence annotations, our approach is able to infer a total of 8,329 direct EC-Pfam associations with an overall F-measure of 0.819 with respect to the manually curated InterPro database, which we treat here as a “gold standard” reference dataset.
    [Show full text]
  • RCSB Protein Data Bank: Overview
    RCSB Protein Data Bank www.pdb.org RCSB Protein Data Bank: Overview Helen M. Berman July 24, 2009 Vision To provide a global resource for the advancement of research and education in biology and medicine by curating, integrating, and disseminating biological macromolecular structural information in the context of function, biological processes, evolution, pathways and disease states. We will implement standards, and anticipate and develop appropriate technologies to support evolving science. Structural Views of Biology and Medicine Mission Support a resource that is by, for, and of the community by providing . leadership in the representation of biological structures derived via experimental methods . data in an accurate and timely manner . comprehensive, integrated view and unique views of the data so as to enable scientific innovation and education What is the PDB? . Single international repository for all information about the structure of large biological molecules . Archival database with hundreds of thousands of users who depend on the data Archive Contents . Public archive . Internal archive – More than 400,000 files (as – Depositor correspondence of June, 2009) – Depositor contact information – Requires over 93 GBbytes of storage – Paper records – Data dictionaries – Documentation – Derived data files – Historical records from Day One . For each entry – Atomic coordinates – Sequence information – Description of structure – Experimental data – Release status information History of the PDB 1970s . Community discusses how to establish a protein structure archive . Cold Spring Harbor meeting in protein crystallography . PDB established at Brookhaven (Oct 1971; 7 structures) 1980s . Number of structures increases as technology improves . Community discussions about requiring depositions . IUCr guidelines established . Number of structures deposited increases 1990s .
    [Show full text]
  • Uniprot Knowledgebase: a Hub of Integrated Protein Data
    Database, Vol. 2011, Article ID bar009, doi:10.1093/database/bar009 ............................................................................................................................................................................................................................................................................................. Original article UniProt Knowledgebase: a hub of integrated protein data Michele Magrane1,* and UniProt Consortium1,2,3 1European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK, 2Swiss Institute of Bioinformatics, Centre Medical Universitaire, 1 rue Michel Servet, 1211 Geneva 4, Switzerland, 3Protein Information Resource, Georgetown University Medical Center, 3300 Whitehaven St. NW, Suite 1200, Washington, DC 20007; University of Delaware, 15 Innovation Way, Suite 205, Newark, DE 19711, USA *Corresponding author: Tel: +44 (0)1223 494 656; Fax: +44 (0)1223 494 468; Email: [email protected] Submitted 24 November 2010; Accepted 10 March 2011 ............................................................................................................................................................................................................................................................................................. The UniProt Knowledgebase (UniProtKB) acts as a central hub of protein knowledge by providing a unified view of protein sequence and functional information. Manual and automatic annotation procedures are used to add data directly to
    [Show full text]
  • Lab Manual.Indd 1 17/01/2019 4:34:55 PM Title : Bioinformatics for Beginners Laboratory Manual
    BIOINFORMATICS For Beginners LABORATORY MANUAL Prepared Under DBT STAR COLLEGE SCHEME Department of Computer Science PSGR KRISHNAMMAL COLLEGE FOR WOMEN College of Excellence An Autonomous Institution - Affiliated to Bharathiar University Reaccredited with ‘A’ Grade by NAAC An ISO 9001:2015 Certified Institution Peelamedu, Coimbatore – 641 004 Published by BLUE HILL PUBLISHERS Coimbatore - 641 113, Tamil Nadu, India. Web: www.bluehillpublishers.com Project 3_Lab Manual.indd 1 17/01/2019 4:34:55 PM Title : Bioinformatics for Beginners Laboratory Manual Language : English Year : 2018 Author : Department of Computer Science PSGR Krishnammal College for Women, Coimbatore – 641 004, Tamil Nadu, India. ISBN Number : 9788193708828 Copyright Warning : No part of this book may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying or recording or by any information storage and retrieval system without permission in writing from PSGR Krishnammal College for Women, Coimbatore, Tamil Nadu, India. Published By : Blue Hill Publishers, Coimbatore. Project 3_Lab Manual.indd 2 17/01/2019 4:34:55 PM EDITORS Dr.M.S.Vijaya Dr.J.Viji Gripsy Dr.S.C.Punitha Mrs.N.Deepa Dr.S.Karpagavalli Mrs.J.Shalini Dr.C.Arunpriya Mrs.N.A.Sheela Selvakumari Mrs.R.Kavitha Dr.R.Vishnupriya Mrs.A.S.Kavitha Project 3_Lab Manual.indd 3 17/01/2019 4:34:55 PM Project 3_Lab Manual.indd 4 17/01/2019 4:34:55 PM PREFACE Bioinformatics is the application of computational techniques and tools to analyzeand manage biological data. The purpose of this laboratory manual is to pre- sent a collection of laboratory techniques for the benefit of beginners in bioinfor- matics.
    [Show full text]
  • The Role of Uniprot's Protein Sequence Databases in Biomedical Research
    Andrew Nightingale1, Tunca Dogan1, Diego Poggioli1, Maria Martin1 and the UniProt Consortium1,2,3 1 EMBL-European Bioinformatics Institute, Cambridge, UK 2 SIB Swiss Institute of Bioinformatics, Geneva, Switzerland 3 Protein Information Resource, Georgetown University, Washington DC & University od Delaware, USA The Role of UniProt's Protein Sequence Databases in Biomedical Research Introduction Mapping diseases to InterPro UniProt provides human disease information with extensive Domains, Variants and ChEMBL cross-references to disease relevant databases such as: Medical Subject Headings (MeSH)1 and Online Mendelian Inheritance in Man Compounds (OMIM)2, Figure 1. In order to further enhance the functional annotations relevance to biomedical research; UniProt has recently developed a pipeline for importing protein altering variants from globally ● 4,246 diseases mapped to 2,337 InterPro domains. recognised genetic variant repositories with the aim to extend the ● 316 InterPro domains from 510 protein entries matched to 3,601 ChEMBL manually curated set of natural protein altering variants provided by ligands. UniProt. By combining these resources UniProt has become a more relevant resource for biomedical research and drug target identification. ● Somatic variants have been found within the binding pockets of proteins Here we describe how users of UniProt can develop methodologies to associated to specific cancer types. utilise the described cross-references, protein structure and functional annotations to explore how structural, functional and chemical ligand annotations can be utilised to identify relationships between a protein and disease causing variants. Mitogen-activated Protein Kinase 4 Example Figure 1: Disease and natural variant annotation for UniProtKB/SwissProt entry for Human BRCA1 Figure 3: MAPK4 binding pocket with analogue inhibitor bound and p.Ser233Ala variant.
    [Show full text]
  • Search of Biological Databases and Literature
    Search of Biological Databases and Literature Jianlin Cheng, PhD School of Electrical Engineering and Computer Science University of Central Florida 2006 Free for academic use. Copyright @ Jianlin Cheng & original sources for some materials Databases and Literature 1. NCBI and Entrez 2. Genbank (nucleotide (DNA and RNA)) 3. SwissProt and PIR (protein sequences) 4. PDB (protein structure) 5. GO (gene ontology, gene/protein function) 6. Ensembl 7. ExPASy (Expert Protein Analysis System) 8. BioMedical Literature (PubMed) 1. NCBI - Main Portal PubMed is… • National Library of Medicine's search service • 16 million citations in MEDLINE • links to participating online journals • PubMed tutorial (via “Education” on side bar) Adapted from J. Pevsner, 2005 Entrez A robust and flexible database retrieval system that covers over 20 biological databases containing DNA and protein sequence data, genome mapping data, population sets, phylogenetic sets, environmental sample sets, gene expression data, the NCBI taxonomy, protein domain, protein structure, MEDLINE references via PubMed. Entrez is a search and retrieval system that integrates NCBI databases J. Pevsner, 2005 Entrez (http://www.ncbi.nlm.nih.gov/Entrez/) BLAST is… • Basic Local Alignment Search Tool • NCBI's sequence similarity search tool • supports analysis of DNA and protein databases • 80,000 searches per day J. Pevsner, 2005 OMIM is… •Online Mendelian Inheritance in Man •catalog of human genes and genetic disorders •edited by Dr. Victor McKusick, others at JHU J. Pevsner, 2005 Books is… • searchable resource of on-line books J. Pevsner, 2005 TaxBrowser is… • browser for the major divisions of living organisms (archaea, bacteria, eukaryota, viruses) • taxonomy information such as genetic codes • molecular data on extinct organisms J.
    [Show full text]
  • Saccharomyces Cerevisiae HOWARD BUSSEY*T, DAVID B
    Proc. Natl. Acad. Sci. USA Vol. 92, pp. 3809-3813, April 1995 Genetics The nucleotide sequence of chromosome I from Saccharomyces cerevisiae HOWARD BUSSEY*t, DAVID B. KABACKI, WUWEI ZHONG*, DAHN T. Vo*, MICHAEL W. CLARK*, NATHALIE FORTIN*, JOHN HALL*, B. F. FRANCIs OUELLETTE*, TERESA KENG§, ARNOLD B. BARTONI, YUPING SUt, CHRIS J. DAVIESt, AND REG K. STORMS*II *Yeast Chromosome I Project, Biology Department, McGill University, Montreal, QC, Canada H3A lB1; *Department of Microbiology and Molecular Genetics, University of Medicine and Dentistry of New Jersey-New Jersey Medical School, Newark, NJ 07103; §Department of Microbiology and Immunology, McGill University, Montreal, QC, Canada H3A 2B4; 1Department of Biology, University of North Carolina, Chapel Hill, NC 27599; and I1Biology Department, Concordia University, Montreal, QC, Canada H3G 1M8 Communicated by Phillips W Robbins, Massachusetts Institute of Technology, Cambridge, MA, January 12, 1995 ABSTRACT Chromosome I from the yeast Saccharomyces plates for DNA sequencing. These were the library of Riles et cerevisiae contains a DNA molecule of -231 kbp and is the aL (8), a cosmid from the collection of Dujon (9), chromosome smallest naturally occurring functional eukaryotic nuclear walking (10), and PCR amplified fragments of genomic DNA. chromosome so far characterized. The nucleotide sequence of DNA fragments, except those generated by PCR which were this chromosome has been determined as part of an interna- used directly, were subcloned into the Bluescript KS(+) plas- tional collaboration to sequence the entire yeast genome. The mid from Stratagene prior to sequencing. All DNA sequencing chromosome contains 89 open reading frames and 4 tRNA was performed using double-stranded DNA templates.
    [Show full text]
  • Introduction to BLAST Using Human Leptin
    Introduction to BLAST using Human Leptin Justin R. DiAngelo1, Alexis Nagengast2, Wilson Leung3 1Penn State Berks, 2Widener University, 3Washington University in St. Louis Table of Contents Cover Page 2 Submission Details 2 Lesson Overview 2 Learning Topics 3 Student Prerequisites 3 Instructor Prerequisites 3 ImplementaLon RecommendaLons 3 Accessibility 3 Exercise 4 What is BLAST? 4 Obtaining sequence using NCBI 5 Performing a BLAST search 9 Summary 17 Exercise Summary 17 Summave Quesons 17 Glossary 18 IntroducLon to BLAST using Human LepLn Cover Page Submission Details Submier: Douglas L. Chalker ([email protected]) Submission Lmestamp: 2019/08/01 6:22:07 PM EST Authors: Jusn R. DiAngelo, Penn State Berks Alexis Nagengast, Widener University Wilson Leung, Washington University in St. Louis Corresponding author: Wilson Leung ([email protected]) Lesson Overview Lesson abstract: IntroducLon to the use of the Basic Local Alignment Search Tool (BLAST) to idenLfy related sequences and compare similarity between them. This exercise used the example of the human lepLn gene. It begins with recovering a desired sequence then demonstrates the use of BLAST to find related sequences. Lesson keywords: Bioinformacs BLAST Database Protein sequence Organism(s) that are the focus of Human this lesson: Type(s) of student learning Short answer assessments: Websites and online databases used: NCBI BLAST (all BLAST and GenBank pages) Other NCBI sites c2 IntroducLon to BLAST using Human LepLn Learning Topics Topics in scienLfic fields: Bioinformacs Genecs Genomics
    [Show full text]