<<

Gayatri Chavali1 and UniProt Consortium1,2,3 1EMBL-European Institute, Cambridge, UK 2SIB Swiss Institute of Bioinformatics, Geneva, Switzerland 3Protein Information Resource, Georgetown University, Washington DC & University of Delaware, USA

Sequence Submissions to UniProtKB

Overview The SPIN Interface

UniProtKB is a comprehensive, centralised resource for sequence and functional information. Sequences in UniProtKB are derived primarily from translations of coding sequences submitted to the International Nucleotide Sequence Collaboration of ENA, GenBank and DDBJ. Other sources include the wwPDB and predictions from Ensembl and RefSeq. Protein sequences are also incorporated into the database as submissions from direct methods.

Direct Submissions

UniProt accepts depositions of direct protein/peptide sequences derived from techniques such as Edman degradation and MS/MS. Direct data submissions are accepted from the research community via SPIN, the interactive web-based submission tool. http://www.ebi.ac.uk/swissprot/Submissions/spin/index.jsp In addition to the primary sequence, submitters are asked to provide: • Source • Source strain/tissue • Citation details • Experimental method used to obtain sequence • Any relevant characterisation data

SPIN submissions are annotated using data provided by the submitter coupled with results from tools and information propagated from homologous sequences already present in the database. Deposition Metrics

Annotation is carried out by maintaining a dialogue with the submitter to ascertain the supporting evidence available for the deposited sequence. A Number of unique 900 sequences unique accession number is assigned to each submitted sequence which 800 Number of unique submitters can be used by the submitter in subsequent publications. 700

600 Bulk Submissions 500 400 300 Sequence submissions occasionally consist of more than 50 sequences. 200

In the event of such situations submitters may provide the sequences in a 100 bulk file. Currently, improvements are on-going to streamline the 0 Mammalia Amphibia Aves Reptiles Insects 2004 2005 2006 2007 2008 2009 2010 2011 2012 Molluscs Plants Fungi Prokaryotes Others treatment of bulk submissions. Direct submissions over years Distribution of submissions on Sequence Release (2004-2012) taxonomy

Submitted data are either released or moved into a confidential holding Future Developments area depending on the release/hold instructions provided by the submitter. Entries marked for release are made available as part of the next UniProt SPIN is currently under re-development to provide an enhanced submission release cycle. interface along with improvements in handling bulk submission data.

Funding UniProt is funded by the European Molecular Biology Laboratory, National Institutes of Health, European Union, Swiss Federal Government, British Heart Foundation and National Science Foundation.

Email: help@.org

URL: www.uniprot.org