Article (Published Version)

Article (Published Version)

Article Swiss-Prot: juggling between evolution and stability BAIROCH, Amos Marc, et al. Abstract We describe some of the aspects of Swiss-Prot that make it unique, explain what are the developments we believe to be necessary for the database to continue to play its role as a focal point of protein knowledge, and provide advice pertinent to the development of high-quality knowledge resources on one aspect or the other of the life sciences. Reference BAIROCH, Amos Marc, et al. Swiss-Prot: juggling between evolution and stability. Briefings in Bioinformatics, 2004, vol. 5, no. 1, p. 39-55 DOI : 10.1093/bib/5.1.39 PMID : 15153305 Available at: http://archive-ouverte.unige.ch/unige:38278 Disclaimer: layout of this document may differ from the published version. 1 / 1 Amos Bairoch Swiss-Prot: Juggling between heads the Swiss-Prot group at the SIB and is a professor at the Department of Structural evolution and stability Biology and Bioinformatics of the University of Geneva. Amos Bairoch, Brigitte Boeckmann, Serenella Ferro and Elisabeth Gasteiger Date received (in revised form): 22nd December 2003 Brigitte Boeckmann has been working in the Swiss- Prot group for 16 years. She Abstract has been involved in annotation and tool development and is We describe some of the aspects of Swiss-Prot that make it unique, explain what are the now coordinating automatic developments we believe to be necessary for the database to continue to play its role as a focal annotation in Swiss-Prot. point of protein knowledge, and provide advice pertinent to the development of high-quality Serenella Ferro knowledge resources on one aspect or the other of the life sciences. has a background in biochemistry and chemistry and has worked as a Swiss-Prot head annotator for 15 years. Elisabeth Gasteiger INTRODUCTION are the particular aspects of Swiss-Prot coordinates software development in the SIB Swiss- The goal of this article is not to depict the that make it unique, and hopefully derive 1 Prot group and is in charge of history of Swiss-Prot, as this has already some advice that would be pertinent to the ExPASy server. been done elsewhere,2 but rather to someone embarking on the development explore some of the consequences of of a high-quality knowledge resource on decisions taken about 20 years ago, to one aspect or the other of the life Keywords: protein sequence, database, functional discuss how the database has constantly sciences. But before we do so, we want to annotation, automatic evolved and to describe the challenges enumerate six observations that we annotation, sequence analysis, that it currently faces. To say that the past believe are important to communicate to user feedback 20 years have been exciting would be a any would-be developers of such major understatement. Most young databases: scientists now starting a career in the life science fields are not aware of how much • Your task will be much more complex the combined technological revolutions and far bigger that you ever thought it that led to high-throughput sequencing could be. and the WWW have quantitatively and • If your database is successful and useful qualitatively changed the universe of to the user community, then you will knowledge on proteins. Yet, while we have to dedicate all your efforts to now have to cater in the Swiss-Prot and develop it for a much longer period of TrEMBL sections of the UniProt time than you would have thought knowledgebase3 for more than 1 million possible. protein sequences, there is a continuously • You will always wonder why life widening chasm between truly scientists abhor complying with characterised proteins and those that have nomenclature guidelines or been solely predicted by genome- standardisation efforts that would sequencing projects. For us, in Swiss- simplify your and their life. Prot, the ultimate in terms of a well- • You will have to continually fight to characterised protein is one for which not obtain a minimal amount of funding. Amos Bairoch, Swiss Institute of Bioinformatics, only the exact sequence, post-translational • As with any service efforts, you will be Centre Me´dical Universitaire, modifications, subcellular location, tissue told far more what you do wrong 1 Rue Michel Servet, specificity, interaction partners and 3D rather than what you do right. 1211 Geneva 4, Switzerland structure are known, but more crucially • But when you will see how useful for which a functional role can be your efforts are to your users, all the Tel: +41 22 379 50 50 Fax: +41 22 379 58 58 assigned. above drawbacks will lose their E-mail: [email protected] What we hope to convey in this paper importance! & HENRY STEWART PUBLICATIONS 1467-5463. BRIEFINGS IN BIOINFORMATICS. VOL 5. NO 1. 39–55. MARCH 2004 3 9 Bairoch et al. A SMALL BIT OF database. With foresight they HISTORICAL immediately accepted. The collaboration INTROSPECTION that grew from this early decision gave How Swiss-Prot started and rise to the current situation: Swiss-Prot is how it institutionally evolved a fully collaborative endeavour of what In 1965, the late Margaret Dayhoff has become the Swiss-Prot group at the published the first edition of the ‘Atlas of Swiss Institute of Bioinformatics (SIB) Protein Sequence and Structure’.4 It and the European Bioinformatics contained information on 65 protein Institute (EBI), an outstation of EMBL. sequences. In the introduction she The last institutional development was expressed the mission of the Atlas as the decision, in late 2003, of the NIH to award a major grant to a consortium locating all of the relevant publications; composed of the EBI, the SIB and PIR critically reviewing the data and to produce a universal resource on resolving conflicting reports; proteins, known as UniProt. Swiss-Prot contains transforming the data into a uniform Today, in 2004, more than 120 people mostly manual format to reflect those aspects of the directly work on Swiss-Prot and annotated entries structure that have been TrEMBL (see below) or on resources that experimentally determined and those evolved out of Swiss-Prot. While the first that could reasonably be inferred by reaction to this figure can be ‘that’s a lot homology; identifying the material of people’, it pales when compared with with regard to chemical function, the amount of work to be carried out. In biological source, genetic control, and fact this is a major issue shared by all life evolutionary origin... sciences information resources: long- This ambitious and still highly pertinent term, high-quality curation of mission statement is a tribute to the vision information is not cheap. It is not as shown by Margaret Dayhoff. She pursued glamorous as whole genome sequencing her task until her untimely death in 1983. projects or any such well-defined At that time the Atlas had evolved into a scientific and technological efforts, yet it protein sequence data bank known as the needs to be adequately and stably funded. Protein Identification Resource (PIR) of Sadly, this is not yet widely recognised by TrEMBL consists of the National Biomedical Research funding bodies. computer-annotated Foundation (NBRF). When in 1985, one entries, which are not of us (Amos Bairoch) was, in the context Why TrEMBL was developed yet in Swiss-Prot of a PhD thesis, developing a software In the mid-1990s it was already clear that package (PC/Gene5) to analyse protein the increased data flow from genome sequences, he was faced with some projects was going to be a major challenge deficiencies and omissions in the PIR for Swiss-Prot. As will be explained database. As he did not receive satisfactory further on, maintaining the high quality feedback from PIR, he resolved to of the database requires careful sequence develop a version of PIR in the format of analysis and detailed annotation of every the European Molecular Biology entry. This was, and still is, a major rate- Laboratory (EMBL) nucleotide sequence limiting step. We did not wish to relax database that would contain additional the editorial standards of Swiss-Prot and sequences and, more crucially, additional there was a limit to how much the annotations on various aspects of the annotation procedures could be protein universe. accelerated. Yet it was vital to make new In mid-1986, the first release of Swiss- sequences available as quickly as possible. Prot came out. Almost immediately we To address this concern, we introduced in approached the EMBL to see if they 1996 TrEMBL (Translation of EMBL). were interested in distributing and TrEMBL consists of computer-annotated helping with the maintenance of the entries derived from the translation of all 40 & HENRY STEWART PUBLICATIONS 1467-5463. BRIEFINGS IN BIOINFORMATICS. VOL 5. NO 1. 39–55. MARCH 2004 Swiss-Prot: Juggling between evolution and stability coding sequences in the EMBL database, machines. In 1986, most nucleotide except for those already included in sequences submitted to the DNA Swiss-Prot. TrEMBL is therefore a databases originated from individual complement to Swiss-Prot and sequence laboratories that were sequencing a single entries only move out from TrEMBL and gene or a small region of a genome. enter Swiss-Prot after having been Today, the biggest (in terms of quantity) manually curated by an annotator. contributors are major sequencing centres From 1996 to the end of 2003, Swiss- that either provide complete genomic Prot grew by 83,000 sequences to reach a sequences or massive amounts of data total of 140,000 entries. In this period of from full-length cDNAs. time, TrEMBL grew from the 86,000 As we depend on primary sequence entries in its first release to about 1.1 data that have been submitted to the million entries! nucleotide sequence databases, it would seem at first glance that there is not really WHAT MAKES SWISS-PROT anything we can do to improve the The correct protein sequence is the basis for SPECIAL quality of the derived protein sequences.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    18 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us