Nucleic Acids Research, 2006, Vol. 34, Database issue D291–D295 doi:10.1093/nar/gkj059 MODBASE: a database of annotated comparative protein structure models and associated resources Ursula Pieper1,2, Narayanan Eswar1,2, Fred P. Davis1,2, Hannes Braberg1,2, M. S. Madhusudhan1,2, Andrea Rossi1,2, Marc Marti-Renom1,2, Rachel Karchin1,2, Ben M. Webb1,2, David Eramian1,2,4, Min-Yi Shen1,2, Libusha Kelly1,2,5, Francisco Melo3 and Andrej Sali1,2,*
1Department of Biopharmaceutical Sciences and 2Department Pharmaceutical Chemistry, California Institute for Quantitative Biomedical Research, QB3 at Mission Bay, Office 503B, University of California at San Francisco, 1700 4th Street, San Francisco, CA 94158, USA, 3Departamento de Gene´tica Molecular y Microbiologı´a, Facultad de Ciencias Biolo´gicas, Pontificia Universidad Cato´lica de Chile, Alameda 340, Santiago, Chile, 4Graduate Group in Biophysics and 5Graduate Group in Biological and Medical Informatics, University of California, San Francisco, CA, USA
Received September 14, 2005; Accepted October 5, 2005
ABSTRACT sites, interactions between yeast proteins, and func- MODBASE (http://salilab.org/modbase) is a database tional consequences of human nsSNPs (LS-SNP, of annotated comparative protein structure models http://salilab.org/LS-SNP). for all available protein sequences that can be matched to at least one known protein structure. The models are calculated by MODPIPE, an automated INTRODUCTION modeling pipeline that relies on MODELLER for fold The genome sequencing efforts are providing us with com- assignment, sequence–structure alignment, model plete genetic blueprints for hundreds of organisms, including building and model assessment (http:/salilab.org/ humans. We are now faced with the challenge of assigning, modeller). MODBASE is updated regularly to reflect investigating and modifying the functions of proteins encoded the growth in protein sequence and structure by these genomes. This task is generally facilitated by 3D structures of the proteins (1–3), which are best determined by databases, and improvements in the software for cal- experimental methods, such as X-ray crystallography and culating the models. MODBASE currently contains NMR-spectroscopy. The number of experimentally deter- 3 094 524 reliable models for domains in 1 094 750 mined structures deposited in the PDB increased from 23 096 out of 1 817 889 unique protein sequences in the to 31 823 over the last 2 years (August 2005) (4). However, the UniProt database (July 5, 2005); only models based number of sequences in comprehensive sequence databases, on statistically significant alignments and models such as UniProt (5) and GenPept (6), continues to grow even assessed to have the correct fold despite insignificant more rapidly, increasing from 1.2 to 2 million over the last alignments are included. MODBASE also allows users 2 years (August 2005). Therefore, protein structure prediction to generate comparative models for proteins of inter- is essential to obtain structural information for sequences est with the automated modeling server MODWEB where no experimental structure is available. (http://salilab.org/modweb). Our other resources The most accurate models are generally obtained by homo- logy or comparative modeling (7–10), a method that is applic- integrated with MODBASE include comprehensive able if an experimental structure related to a given target databases of multiple protein structure alignments sequence is available. The fraction of sequences for which (DBAli, http://salilab.org/dbali), structurally defined comparative models can be obtained automatically has ligand binding sites and structurally defined binary increased moderately from 57 to 60% over the last 2 years, domain interfaces (PIBASE, http://salilab.org/ reflecting the counteracting effects of structural genomics pibase) as well as predictions of ligand binding (11,12) and many new genomic sequences.
*To whom correspondence should be addressed. Tel: +1 415 514 4227; Fax: +1 415 514 4231; Email: [email protected]