Overview Direct Submissions Bulk Submissions the SPIN Interface Sequence Release Deposition Metrics Future Developments
Total Page:16
File Type:pdf, Size:1020Kb
Gayatri Chavali1 and UniProt Consortium1,2,3 1 EMBL-European Bioinformatics Institute, Cambridge, UK 2SIB Swiss Institute of Bioinformatics, Geneva, Switzerland 3Protein Information Resource, Georgetown University, Washington DC & University of Delaware, USA Sequence Submissions to UniProtKB Overview The SPIN Interface UniProtKB is a comprehensive, centralised resource for protein sequence and functional information. Sequences in UniProtKB are derived primarily from translations of coding sequences submitted to the International Nucleotide Sequence Database Collaboration of ENA, GenBank and DDBJ. Other sources include the wwPDB and gene predictions from Ensembl and RefSeq. Protein sequences are also incorporated into the database as submissions from direct sequencing methods. Direct Submissions UniProt accepts depositions of direct protein/peptide sequences derived from techniques such as Edman degradation and MS/MS. Direct data submissions are accepted from the research community via SPIN, the interactive web-based submission tool. http://www.ebi.ac.uk/swissprot/Submissions/spin/index.jsp In addition to the primary sequence, submitters are asked to provide: • Source Organism • Source strain/tissue • Citation details • Experimental method used to obtain sequence • Any relevant characterisation data SPIN submissions are annotated using data provided by the submitter coupled with results from sequence analysis tools and information propagated from homologous sequences already present in the database. Deposition Metrics Annotation is carried out by maintaining a dialogue with the submitter to ascertain the supporting evidence available for the deposited sequence. A Number of unique 900 sequences unique accession number is assigned to each submitted sequence which 800 Number of unique submitters can be used by the submitter in subsequent publications. 700 600 500 Bulk Submissions 400 300 Sequence submissions occasionally consist of more than 50 sequences. 200 In the event of such situations submitters may provide the sequences in a 100 bulk fasta file. Currently, improvements are on-going to streamline the 0 Mammalia Amphibia Aves Reptiles Insects 2004 2005 2006 2007 2008 2009 2010 2011 2012 Molluscs Plants Fungi Prokaryotes Others treatment of bulk submissions. Direct submissions over years Distribution of submissions on (2004-2012) taxonomy Sequence Release Submitted data are either released or moved into a confidential holding Future Developments area depending on the release/hold instructions provided by the submitter. Entries marked for release are made available as part of the next UniProt SPIN is currently under re-development to provide an enhanced submission release cycle. interface along with improvements in handling bulk submission data. Funding UniProt is funded by the European Molecular Biology Laboratory, National Institutes of Health, European Union, Swiss Federal Government, British Heart Foundation and National Science Foundation. Email: [email protected] URL: www.uniprot.org .