Nucleic Acids Research, 2006, Vol. 34, Database issue D291–D295 doi:10.1093/nar/gkj059 MODBASE: a database of annotated comparative protein structure models and associated resources Ursula Pieper1,2, Narayanan Eswar1,2, Fred P. Davis1,2, Hannes Braberg1,2, M. S. Madhusudhan1,2, Andrea Rossi1,2, Marc Marti-Renom1,2, Rachel Karchin1,2, Ben M. Webb1,2, David Eramian1,2,4, Min-Yi Shen1,2, Libusha Kelly1,2,5, Francisco Melo3 and Andrej Sali1,2,*

1Department of Biopharmaceutical Sciences and 2Department Pharmaceutical Chemistry, California Institute for Quantitative Biomedical Research, QB3 at Mission Bay, Office 503B, University of California at San Francisco, 1700 4th Street, San Francisco, CA 94158, USA, 3Departamento de ´tica Molecular y Microbiologı´a, Facultad de Ciencias Biolo´gicas, Pontificia Universidad Cato´lica de Chile, Alameda 340, Santiago, Chile, 4Graduate Group in Biophysics and 5Graduate Group in Biological and Medical Informatics, University of California, San Francisco, CA, USA

Received September 14, 2005; Accepted October 5, 2005

ABSTRACT sites, interactions between yeast proteins, and func- MODBASE (http://salilab.org/modbase) is a database tional consequences of nsSNPs (LS-SNP, of annotated comparative protein structure models http://salilab.org/LS-SNP). for all available protein sequences that can be matched to at least one known protein structure. The models are calculated by MODPIPE, an automated INTRODUCTION modeling pipeline that relies on MODELLER for fold The sequencing efforts are providing us with com- assignment, sequence–structure alignment, model plete genetic blueprints for hundreds of , including building and model assessment (http:/salilab.org/ . We are now faced with the challenge of assigning, modeller). MODBASE is updated regularly to reflect investigating and modifying the functions of proteins encoded the growth in protein sequence and structure by these . This task is generally facilitated by 3D structures of the proteins (1–3), which are best determined by databases, and improvements in the software for cal- experimental methods, such as X-ray crystallography and culating the models. MODBASE currently contains NMR-spectroscopy. The number of experimentally deter- 3 094 524 reliable models for domains in 1 094 750 mined structures deposited in the PDB increased from 23 096 out of 1 817 889 unique protein sequences in the to 31 823 over the last 2 years (August 2005) (4). However, the UniProt database (July 5, 2005); only models based number of sequences in comprehensive sequence databases, on statistically significant alignments and models such as UniProt (5) and GenPept (6), continues to grow even assessed to have the correct fold despite insignificant more rapidly, increasing from 1.2 to 2 million over the last alignments are included. MODBASE also allows users 2 years (August 2005). Therefore, protein structure prediction to generate comparative models for proteins of inter- is essential to obtain structural for sequences est with the automated modeling server MODWEB where no experimental structure is available. (http://salilab.org/modweb). Our other resources The most accurate models are generally obtained by homo- logy or comparative modeling (7–10), a method that is applic- integrated with MODBASE include comprehensive able if an experimental structure related to a given target databases of multiple protein structure alignments sequence is available. The fraction of sequences for which (DBAli, http://salilab.org/dbali), structurally defined comparative models can be obtained automatically has ligand binding sites and structurally defined binary increased moderately from 57 to 60% over the last 2 years, domain interfaces (PIBASE, http://salilab.org/ reflecting the counteracting effects of structural pibase) as well as predictions of ligand binding (11,12) and many new genomic sequences.

*To whom correspondence should be addressed. Tel: +1 415 514 4227; Fax: +1 415 514 4231; Email: [email protected]

The Author 2006. Published by Oxford University Press. All rights reserved.

The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact [email protected]

Downloaded from https://academic.oup.com/nar/article-abstract/34/suppl_1/D291/1132895 by Pontificia Universidad Católica de Chile user on 08 June 2018 D292 Nucleic Acids Research, 2006, Vol. 34, Database issue

The process of comparative modeling usually requires the UniProt database that are detectably related to at least one use of a number of programs to identify template structures, to known structure in the PDB. This dataset is freely accessible generate sequence–structure alignments, to build the models to academic scientists. Currently, it contains 3 million mod- and to evaluate them. In addition, various sequence and struc- els for domains in 1.1 million out of the 1.8 million unique ture databases that are accessed by these programs are needed. sequences in the UniProt database (July 5, 2005), with an Once an initial model is calculated, it is generally refined and average length of 235 residues per model. For example, ultimately analyzed in the context of many other related there are models for domains in 32 985 human sequences, proteins and their functional annotations. 22 880 sequences from Arabidopsis thaliana, 15 195 In this paper, we present MODBASE, a database of com- sequences from Drosophila melanogaster and 9691 sequences parative protein structure models, and several associated data- from Escherichia coli. Other datasets include models bases and servers that facilitate these tasks for both expert and calculated for the New York Structural GenomiX Research novice users. We highlight the improvements of MODBASE Consortium, datasets calculated by MODWEB and various that were implemented since the last report (13), including datasets associated with our other modeling projects. updated modeling software, a completely redesigned user Some of the older datasets were calculated with an earlier interface, incorporation of annotated human single point muta- version of MODPIPE (19) based on single template structures tions, homology-based prediction of interacting proteins and and sequence–structure matches generated by PSI-BLAST improved access to information about structurally defined (20) and IMPALA (21). ligand binding sites and binary domain interfaces. MODWEB Closely connected to MODBASE is MODWEB, our compar- CONTENTS ative modeling web-server (http://salilab.org/modweb) (14). Comparative modeling MODWEB accepts one or many sequences in the FASTA format and calculates their models using MODPIPE based Models in MODBASE are calculated using MODPIPE, our on the best available templates from the PDB. Alternatively, completely automated software pipeline for comparative mod- MODWEB also accepts a protein structure as input and cal- eling (14). MODPIPE can calculate comparative models for a culates models for all identifiable sequence homologs in the large number of protein sequences, using many different UniProt database. The latter mode is a useful tool for structural template structures and sequence–structure alignments. genomics efforts (22) to assess the impact of a newly deter- MODPIPE relies on the various modules of MODELLER mined protein structure on the modeling of sequences of (15) for its functionality and is adapted for large-scale opera- unknown structure. It is also used to identify new members tion on a cluster of PCs using scripts written in PERL. of sequence superfamilies with at least one member of known The templates used for model building consist of represent- structure. The results of MODWEB calculations are available ative multiple structure alignments extracted from DBAli (16). through the MODBASE interface as private datasets protected These alignments were prepared by the SALIGN module of with passwords. MODELLER (M. S. Madhusudan, M. A. Marti-Renom and A. Sali, manuscript in preparation), which implements a DBAli multiple structure alignment method similar to that in the program COMPARER (17). Sequence profiles are constructed DBAli (http://salilab.org/DBAli/) stores pairwise comparisons for both the target sequences and the templates by scanning of all structures in the PDB calculated using the program against the UniProt database of sequences, relying on the MAMMOTH (23), as well as multiple structure alignments BUILD_PROFILE module of MODELLER (N. Eswar, generated by the SALIGN module of MODELLER. DBAli is M. S. Madhusudhan and A. Sali, manuscript in preparation). updated weekly. As of June 2005, DBAli contains more than BUILD_PROFILE is an iterative database searching protocol 800 million pairwise comparisons and 8500 family-based that relies on local dynamic programming and a robust method multiple structure alignments for 22 000 non-redundant pro- of estimation of alignment significance. Sequence–structure tein chains in the PDB. Several programs are used to provide matches are established by aligning the target sequence profile additional information: (i) ModDom assigns domain bound- against the template profiles, using local dynamic program- aries from structure; (ii) ModClus allows the user to generate ming implemented in the PROFILE_PROFILE_SCAN clusters of similar protein structures; and (iii) AnnoLyze and module of MODELLER (18). Significant alignments covering AnnoLite annotate the functions of proteins in DBAli. The distinct regions of the target sequence are chosen for model- DBAli tools help users to analyze the protein structure space ing. Models are calculated for each of the sequence–structure by establishing relationships between protein structures and matches using MODELLER. The resulting models are then their fragments in a flexible and dynamic manner. evaluated by a composite model assessment criterion that depends on the compactness of a model, the sequence identity Predicted ligand binding sites of the sequence–structure match and statistical energy MODBASE stores a list of the binding sites of known structure Z-scores (D. Eramian, M. -Y. Shen, A. Sali, M. Marti-Renom, for 100 000 ligands found in the PDB (24). The ligands manuscript in preparation). include small molecules, such as metal ions, nucleotides, sac- charides and peptides. Binding sites in all known structures are Model datasets defined to consist of residues with at least one atom within 5 s Models in MODBASE are organized into a number of data- of any ligand atom. MODBASE also contains predicted bind- sets. The largest dataset contains models of all sequences in the ing sites on template structures that are inherited from any

Downloaded from https://academic.oup.com/nar/article-abstract/34/suppl_1/D291/1132895 by Pontificia Universidad Católica de Chile user on 08 June 2018 Nucleic Acids Research, 2006, Vol. 34, Database issue D293

related known structure if at least 75% of the binding site protein-coding exons that result in a changed amino acid res- residues are within 4 s of the template residues in a global idue type (non-synonymous SNPs or nsSNPs). The genomic superposition of the two structures in DBAli and if at least locations of the SNPs were taken from the dbSNP database 75% of the binding site residue types are invariant. The putat- (32) and comprehensively mapped on the human proteins in ive ligand binding sites in the models are then mapped via the the UniProt database via a collection of protein-to-mRNA and target-template alignments. The putative ligand binding sites mRNA-to-genome alignments produced with the Known are stored as SITE records and the binding site membership algorithm (Fan Hsu, private communication). Using frequency per residue is indicated in the B-factor column of MODPIPE, we built comparative protein structure models the model coordinate files. A total of 65% of MODBASE for each significant alignment covering a distinct region of models have at least one predicted binding site. protein sequence (E-value cut-off 0.0001). We used the modeling results, in conjunction with PIBASE and the ligand PIBASE tables of MODBASE, to infer which SNPs may destabilize protein quaternary structure or interfere with small molecule PIBASE (http://salilab.org/pibase) is a comprehensive data- ligand binding. In addition, a support vector machine (33) that base of structurally defined interfaces between pairs of protein combines features of sequence, structure and evolutionary domains (25). It is composed of binary interfaces extracted conservation was used to predict the SNPs that are associated from structures in the PDB and the Probable Quaternary Struc- with human disease. The resulting structural and functional ture server PQS (26) using domain assignments from the annotations can be queried via the web interface from Structural Classification of Proteins (27) and CATH (28) multiple viewpoints: single SNP, all SNPs in a gene or fold classification systems. PIBASE currently contains protein of interest, all nsSNPs in a genomic region of interest 158 915 interacting domain pairs between 105 061 domains and all SNPs in a KEGG biochemical pathway (34). LS-SNP from 2125 SCOP families. A diverse set of geometric, annotations are cross-linked with MODBASE and UCSC physiochemical and topologic properties are calculated for Genome Browser. The comparative models used in LS-SNP each complex, its domains, interfaces and binding sites. A as well as details of the SNP annotations are available through subset of the interface properties is used to remove interface MODBASE. redundancy within PDB entries, resulting in 20 912 distinct domain–domain interfaces. The complexes are grouped into Case study: modeling of translated protein 989 topological classes based on their patterns of domain– sequences from EST data domain contacts. The binary interfaces and their correspond- ing binding sites are categorized into 18 755 and 30 975 MODBASE contains a new dataset of protein structure models topological classes, respectively, based on the topology of for the expressed sequence tags (ESTs) from the Vitis vinifera secondary structure elements. PIBASE is a convenient genome (dataset Grape-1). While there are no protein struc- resource for structural information on protein–protein inter- tures from V.vinifera in the PDB, our dataset contains 5594 actions and is easily integrated with other databases. It is cur- reliable models for domains of the most probable translated rently used by the DBAli function annotation module and the reading frames of 3144 EST sequences obtained from the LS-SNP annotation system, and serves as a source of templates UNIGENE database (32). The structural modeling of protein for the complex prediction module of MODBASE. sequences implied by the EST data presented new challenges, including selection of the most probable reading frame and consideration of the coverage of the ESTs and the template Predicted protein complexes structure in order to estimate the coverage of the target The composition and structure of protein complexes are pre- sequence. There are at least two reasons for building protein dicted based on similarity to template complexes of known structure models directly from EST data: (i) a larger fraction of structure using comparative modeling. The structural models transcript variants is probably considered for modeling than of the complexes are assessed with a statistical potential when starting with entries in a protein sequence database; and derived from binary domain interfaces obtained from (ii) assessment of models built for inferred protein sequences PIBASE. The approach is applied to the 2434 yeast proteins can contribute to gene annotation efforts by helping to decide with at least one structurally modeled domain, resulting in which ones are actually translated and folded into a stable 3115 binary interaction predictions and 159 complex predic- structure (35). tions of more than two proteins involving 506 and 71 proteins, respectively. The predictions are cross-referenced with the YeastGFP database of yeast protein subcellular localizations ACCESS AND INTERFACE (29). A comparison of the predicted interactions to experi- mental results in the BIND database (30) reveals an overlap The main access to MODBASE is through its web interface at of 3%. The predictions are assigned confidence levels http://salilab.org/modbase, by querying with Uniprot and GI (Z-scores) that may be used to produce a list of testable hypo- identifiers, annotation keywords, PDB codes, datasets, theses about interacting proteins. The estimated false positive sequence similarity to the modeled sequences (BLAST) and rate at the Z-score threshold value of À1 is 16%. model specific criteria, such as model reliability, model size and target-template sequence identity. Additionally, it is pos- sible to retrieve coordinate files, alignment files and ligand LS-SNP binding information in text files. LS-SNP (http://www.salilab.org/LS-SNP) (31) is a database The output of a search is displayed on pages with varying of annotated single nucleotide polymorphisms in human amounts of information about the modeled sequences,

Downloaded from https://academic.oup.com/nar/article-abstract/34/suppl_1/D291/1132895 by Pontificia Universidad Católica de Chile user on 08 June 2018 D294 Nucleic Acids Research, 2006, Vol. 34, Database issue

Figure 1. MODBASE Model Details page (example S15435468 from the Grape-1 dataset): This page provides links to all models for a specific sequence. A ribbon diagram of the primary model, database annotations and modeling details are displayed. Links to additional models for different target regions or models from other datasets are displayed as thumbprints. The pull-down menu provides access to alternative MODBASE views and other types of information (if available), such as data about SNPs and putative ligand binding sites.

template structures, alignments and functional annotations. An FUTURE DIRECTIONS example for the output of a search resulting in one model is MODBASE will be updated monthly to reflect the growth of shown in Figure 1. A ribbon diagram of the model with the the sequence and structure databases, as well as improvements highest target-template sequence identity is displayed by in the methods and software used for calculating the models. default, together with details of the modeling calculation. Ribbon thumbprints of additional models for this sequence link to corresponding pages with more information. The rib- CITATION bon diagrams are generated on the fly using Molscript (36) and Raster-3D (37). Additionally, cross-references to various data- Users of MODBASE are requested to cite this article in their bases, including PDB, UniProt, SwissProt/TrEMBL, PubMed publications. and the UCSC Genome Browser, are provided. A pull-down menu provides links to additional functionality: the ligand binding module, the SNP module, retrieval of coordinate ACKNOWLEDGEMENTS and alignment files, and the Molecular Modeling System We are grateful to Tom Ferrin, Daniel Greenblatt, Conrad Chimera (38) that allows the user to display template and Huang and Tom Goddard for CHIMERA and contributing to model coordinates together with their alignment. Other MOD- the MODBASE/CHIMERA interface. For linking to BASE pages provide overviews of more than one sequence or MODBASE from their databases, we thank David Haussler, structure. All MODBASE pages are interconnected to facilit- Jim Kent (UCSC Genome Browser), Amos Bairoch ate easy navigation between the different modules. Models are (SwissProt/TrEMBL), Rolf Apweiler (InterPro), Patsy Babbitt also directly accessible from other databases, including the (SFLD), and Kathy Wu (PIR/iProClass). The project has been SwissProt/TrEMBL sequence pages, PIR’s iProClass, EBI’s supported by NIH/NIGMS R01 GM 54762, NIH/NIGMS P50 InterPro, the UCSC Genome Browser and PubMed (LinkOut). GM62529, NIH/NIGMS U54 GM074945, NIH/NCI R33

Downloaded from https://academic.oup.com/nar/article-abstract/34/suppl_1/D291/1132895 by Pontificia Universidad Católica de Chile user on 08 June 2018 Nucleic Acids Research, 2006, Vol. 34, Database issue D295

CA84699, NIH GM 08284 (DE), FONDEF G02S1001 (FM), 19. Pieper,U., Eswar,N., Stuart,A.C., Ilyin,V.A. and Sali,A. (2002) the Sandler Family Supporting Foundation, Sun Academic MODBASE, a database of annotated comparative protein structure Equipment Grant EDUD-7824-020257-US, an IBM SUR models. Nucleic Acids Res., 30, 255–259. 20. Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., grant and an Intel computer hardware gift. Funding to pay Miller,W. and Lipman,D.J. (1997) Gapped BLAST and PSI-BLAST: a the Open Access publication charges for this article was new generation of protein database search programs. Nucleic Acids Res., provided by NIH/NIGMS U54 GM074945. 25, 3389–3402. 21. Schaffer,A.A., Wolf,Y.I., Ponting,C.P., Koonin,E.V., Aravind,L. and Conflict of interest statement. None declared. Altschul,S.F. (1999) IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices. , 15, 1000–1011. 22. Chance,M.R., Fiser,A., Sali,A., Pieper,U., Eswar,N., Xu,G., Fajardo,J.E., REFERENCES Radhakannan,T. and Marinkovic,N. (2004) High-throughput computational and experimental techniques in structural genomics. 1. Domingues,F.S., Koppensteiner,W.A. and Sippl,M.J. (2000) The role of Genome Res., 14, 2145–2154. protein structure in genomics. FEBS Lett., 476, 98–102. 23. Ortiz,A.R., Strauss,C.E. and Olmea,O. (2002) MAMMOTH (matching 2. Brenner,S.E. and Levitt,M. (2000) Expectations from structural molecular models obtained from theory): an automated method for model genomics. Protein Sci., 9, 197–200. comparison. Protein Sci., 11, 2606–2621. 3. Skolnick,J., Fetrow,J.S. and Kolinski,A. (2000) Structural genomics and 24. Stuart,A.C., Ilyin,V.A. and Sali,A. (2002) LigBase: a database of families its importance for gene function analysis. Nat. Biotechnol., 18, 283–287. of aligned ligand binding sites in known protein sequences and structures. 4. Westbrook,J., Feng,Z., Jain,S., Bhat,T.N., Thanki,N., Ravichandran,V., Bioinformatics, 18, 200–201. Gilliland,G.L., Bluhm,W., Weissig,H., Greer,D.S. et al. (2002) The 25. Davis,F.P. and Sali,A. (2005) PIBASE: a comprehensive database of Protein Data Bank: unifying the archive. Nucleic Acids Res., 30, 245–248. structurally defined protein interfaces. Bioinformatics, 21, 1901–1907. 5. Bairoch,A., Apweiler,R., Wu,C.H., Barker,W.C., Boeckmann,B., 26. Henrick,K. and Thornton,J.M. (1998) PQS: a protein quaternary structure Ferro,S., Gasteiger,E., Huang,H., Lopez,R., Magrane,M. et al. (2005) The file server. Trends Biochem. Sci., 23, 358–361. Universal Protein Resource (UniProt). Nucleic Acids Res., 33, 27. Andreeva,A., Howorth,D., Brenner,S.E., Hubbard,T.J., Chothia,C. and D154–D159. Murzin,A.G. (2004) SCOP database in 2004: refinements integrate 6. Benson,D.A., Karsch-Mizrachi,I., Lipman,D.J., Ostell,J. and structure and sequence family data. Nucleic Acids Res., 32, D226–D229. Wheeler,D.L. (2005) GenBank. Nucleic Acids Res., 33, D34–D38. 28. Pearl,F., Todd,A., Sillitoe,I., Dibley,M., Redfern,O., Lewis,T., 7. Baker,D. and Sali,A. (2001) Protein structure prediction and structural Bennett,C., Marsden,R., Grant,A., Lee,D. et al. (2005) The CATH genomics. Science, 294, 93–96. Domain Structure Database and related resources Gene3D and DHS 8. Wallner,B. and Elofsson,A. (2005) All are not equal: a benchmark of provide comprehensive domain family information for genome analysis. different homology modeling programs. Protein Sci., 14, 1315–1327. Nucleic Acids Res., 33, D247–D251. 9. Hillisch,A., Pineda,L.F. and Hilgenfeld,R. (2004) Utility of homology 29. Ghaemmaghami,S., Huh,W.K., Bower,K., Howson,R.W., Belle,A., models in the drug discovery process. Drug Discov. Today, 9, Dephoure,N., O’Shea,E.K. and Weissman,J.S. (2003) Global analysis of 659–669. protein expression in yeast. Nature, 425, 737–741. 10. Marti-Renom,M.A., Stuart,A.C., Fiser,A., Sanchez,R., Melo,F. and 30. Alfarano,C., Andrade,C.E., Anthony,K., Bahroos,N., Bajec,M., Sali,A. (2000) Comparative protein structure modeling of genes and Bantoft,K., Betel,D., Bobechko,B., Boutilier,K., Burgess,E. et al. (2005) genomes. Annu. Rev. Biophys. Biomol. Struct., 29, 291–325. The Biomolecular Interaction Network Database and related tools 2005 11. Bonanno,J.B., Almo,S.C., Bresnick,A., Chance,M.R., Fiser,A., update. Nucleic Acids Res., 33, D418–D424. Swaminathan,S., Jiang,J., Studier,F.W., Shapiro,L., Lima,C.D. et al. 31. Karchin,R., Diekhans,M., Kelly,L., Thomas,D.J., Pieper,U., Eswar,N., (2005) New York-Structural GenomiX Research Consortium Haussler,D. and Sali,A. (2005) LS-SNP: large-scale annotation of coding (NYSGXRC): a large scale center for the protein structure initiative. non-synonymous SNPs based on multiple information sources. J. Struct. Funct. Genomics, 6, 225–232. Bioinformatics, 21, 2814–2820. 12. Xie,L. and Bourne,P.E. (2005) Functional coverage of the human genome 32. Wheeler,D.L., Barrett,T., Benson,D.A., Bryant,S.H., Canese,K., by existing structures, structural genomics targets, and homology models. Church,D.M., DiCuccio,M., Edgar,R., Federhen,S., Helmberg,W. et al. PLoS Comput. Biol., 1, e31. (2005) Database resources of the National Center for Biotechnology 13. Pieper,U., Eswar,N., Braberg,H., Madhusudhan,M.S., Davis,F.P., Information. Nucleic Acids Res., 33, D39–D45. Stuart,A.C., Mirkovic,N., Rossi,A., Marti-Renom,M.A., Fiser,A. et al. 33. Vapnik,V. and Chapelle,O. (2000) Bounds on error expectation for (2004)MODBASE,a database of annotated comparativeprotein structure support vector machines. Neural Comput., 12, 2013–2036. models, and associated resources. Nucleic Acids Res., 32, D217–D222. 34. Ogata,H., Goto,S., Sato,K., Fujibuchi,W., Bono,H. and Kanehisa,M. 14. Eswar,N., John,B., Mirkovic,N., Fiser,A., Ilyin,V.A., Pieper,U., (1999) KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Stuart,A.C., Marti-Renom,M.A., Madhusudhan,M.S., Yerkovich,B. et al. Acids Res., 27, 29–34. (2003) Tools for comparative protein structure modeling and analysis. 35. Gopal,S., Schroeder,M., Pieper,U., Sczyrba,A., Aytekin-Kurban,G., Nucleic Acids Res., 31, 3375–3380. Bekiranov,S., Fajardo,J.E., Eswar,N., Sanchez,R., Sali,A. et al. (2001) 15. Sali,A. and Blundell,T.L. (1993) Comparative protein modelling Homology-based annotation yields 1,042 new candidate genes in the by satisfaction of spatial restraints. J. Mol. Biol., 234, Drosophila melanogaster genome. Nature Genet, 27, 337–340. 779–815. 36. Kraulis,P.J. (1991) MOLSCRIPT: a program to produce both detailed 16. Marti-Renom,M.A., Ilyin,V.A. and Sali,A. (2001) DBAli: a database of and schematic plorts of protein structures. J. Appl. Crystallogr., 24, protein structure alignments. Bioinformatics, 17, 746–747. 946–950. 17. Zhu,Z.Y., Sali,A. and Blundell,T.L. (1992) A variable gap penalty 37. Merritt,E.A. and Bacon,D.J. (1997) Raster3D: photorealistic molecular function and feature weights for protein 3-D structure comparisons. graphics. Methods Enzymol., 277, 505–524. Protein Eng., 5, 43–51. 38. Pettersen,E.F., Goddard,T.D., Huang,C.C., Couch,G.S., 18. Marti-Renom,M.A., Madhusudhan,M.S. and Sali,A. (2004) Alignment of Greenblatt,D.M., Meng,E.C., and Ferrin,T.E (2004) UCSF Chimera - A protein sequences by their profiles. Protein Sci., 13, Visualization System for Exploratory Research and Analysis. 1071–1087. J. Comput. Chem., 25, 1605–1612.

Downloaded from https://academic.oup.com/nar/article-abstract/34/suppl_1/D291/1132895 by Pontificia Universidad Católica de Chile user on 08 June 2018