CASP5 Methods Abstracts A-2 123D Server (P0476) - 68 Predictions: 68 3D
Total Page:16
File Type:pdf, Size:1020Kb
CASP5 Methods Abstracts A-2 123D_server (P0476) - 68 predictions: 68 3D 123D: an Old Program for Fold Recognition N.Alexandrov Ceres, Inc. Malibu, CA, USA [email protected] I used the 123D+ web site at http://123d.ncifcrf.gov/ for making predictions. The predictions were completely automatic, without any manual intervention with only exceptions made for multi-domain proteins. For such proteins the strongest local hit was cut out from the query sequence and the rest of the sequence was submitted again. The program 123D+ uses PSI-blast generated profiles for both query sequence and the fold library, secondary structure compatibility, and contact capacity potentials for finding optimal sequence –structure alignment. Fold library was constructed from 40% non-redundant Astral set of SCOP-1.59 domains. Accelrys (P0210) - 24 predictions: 24 3D Comparative Modeling Using GeneAtlasTM Dana Haley-Vicente, Velin Spassov, Tina Yeh, Ken Butenhof, Christoph Schneider, Azat Badretdinov and Lisa Yan Accelrys Inc., 9685 Scranton Road, San Diego, CA 92121, USA [email protected] GeneAtlas™ (1) is a high-throughput pipeline for automated protein structure prediction and function annotation. For template structure identification it uses PSI- BLAST searches and our fold recognition program, SeqFold. To maximize homology recognition, both direct and reverse PSI-BLAST searches are performed and the hits are combined. Automated model building is carried out with Modeler, and models are evaluated using Profiles-3D Verify scores. For CASP5 targets, we first use GeneAtlas to help to identify and select potential PDB templates, and then the alignments are adjusted manually with the aid of various alignment tools (e.g. Align123) in the Homology module in InsightII. Align123 is based on ClustalW and augmented with a secondary structure match term added to the alignment score. If multiple templates are used to build a model, structure-structure alignments are explored using InsightII’s structure alignment tools, as well as Modeler’s MALIGN3D, and the protein structure alignment program CE. Subsequently the sequence-structure alignment is carried out with Modeler’s Align2D. Multiple models are built with Modeler, including the new loop refinement routine based on the optimization of statistical pair potentials. Models are checked for proper stereochemistry, and evaluated by comparing the restraint violations reported by Modeler; and by the Profiles-3D Verify scores, which measure the compatibility of each residue in the model with its environment. A-3 In addition, some targets were selected to test two new methods that we have developed, ChiRotor and Looper, for side-chain and loop prediction. ChiRotor is a fast algorithm that predicts the conformation of all or part of amino-acid side chains with an average RMSD of about 1Å for the core residues. The loop-modeling program, Looper, produces a number of energy minimized loop backbone conformations ranked according to force-field energy terms. Both algorithms are a combination of a discrete search in dihedral angle space and CHARMm energy minimization. 1. Kitson et al. (2002) Functional annotation of proteomic sequences based on consensus of sequence and structural analysis. Briefings in Bioinformatics 3(1), 1-13. A-4 Advanced-ONIZUKA (P0214) - 92 predictions: 92 3D Fold Selection and Patchwork Energy Minimization Kentaro Onizuka Advanced Technology Research Laboratories, Matsushita Electric Industrial Co. Ltd. [email protected] .The new method developed to meet CASP5 consists of three units. 1) Fold recognition unit This unit selects ten to hundred conformations that have relatively good compatibility to the target protein sequence among approximately two thousand non-redundant protein structures collected from PDB release 100. The selected conformations are aligned to the target protein sequence. The compatibility of a conformation against the target sequence is evaluated as the sum of multi-dimensional mean-force potentials between all possible pairs of residues in that conformation, now that having the target sequence aligned. 2) Patch work energy minimization unit This unit builds a protein conformation by concatenating the structure segments cut out of those conformations selected by the fold recognition unit. The conformations selected are aligned to the target protein sequence. Here the concatenation of conformations is done as follows; 1) select two (i-th and j-th) conformations each aligned to the target protein sequence, 2) choose a residue M in the sequence as the crossover point 3) the new conformation is generated by concatenating the segment from N- term (of the target sequence) to M of j-th conformation and the segment from M to C-term (of the target sequence) of i-th conformation. The minimization algorithm is analogous partially to genetic algorithm and also dynamic programming. The minimization procedure first set the several segment core residues, which should never be the crossover points. The core residues are those having locally minimal energy, where the energy of each residue is calculated as the average energy (sum of potentials involving that residue) over all the selected conformations. The first concatenation step takes crossover points M between N-term and the first segment core residue. For i-th conformation, the best combination of M and j with the conformation having minimal energy is selected. The k-th step takes M between k-1-th and k-th core residue. The last step takes M between the last core and the C-term residue. Finally, the best conformation having minimal energy is selected from the remaining new conformations as the result of the energy minimization. 3) Gap caulking unit The protein conformation built by patchwork energy minimization unit contains some gaps inserted or deleted during the alignment process. This unit tries to caulk those gaps by searching the conformations (selected by the fold recognition unit) for the combination of two gapless conformation segments at that region which may substitute the conformation segments containing gaps. ab The multi-dimensional mean force potentials E k are pairwise between two residues with respect to the residue types a and b, sequence separation k, and the six- dimensional relative configuration whose components are 1) the distance between two residues, 2) the direction of residue b from a, and 3) the orientation of b against a (three Euler's angles). The fold recognition unit, however, first employs singleton potentials with respect only to one residue type among the pair in order to generate the energy profile of conformations among non-redundant conformation data-set. Then the target sequence is aligned to each profile using dynamic programming algorithm. A-5 The compatibility of each conformation to the target sequence is evaluated by calculating the total energy, which is the sum of pairwise potentials according to that alignment. The energy minimization unit employs pairwise potentials plus attractive force potentials because the energy minimization using only the net mean force potentials1 generates an extended conformation rather than compact one. The attractive potentials adopted here are such that are proportional to the square of the distance between residues. The performance of the minimization algorithm proposed is intense, although the algorithm logically does not assure to generate the optimal solution. The most difficult problem remaining is the potentials for minimization. 1. Sippl M.J. (1990) Calculation of Conformational Ensembles from Poten-tials of Mean Force: An Approach to the Knowledge-based Prediction of Local Structure in Globular Proteins. J. Mol. Biol., 213, 859-883. 2. Onizuka K., Noguchi T., Akiyama Y. Matsuda H. (2002) Using Data Compression for Multidimensional Distribution Analysis. Intelligent Systems May/June 2002, 48-54. ALAX (P0234) - 39 predictions: 39 3D A New Sequence Alignment Method ALAX and Its Application to Homology Modeling Atsushi Hijikata1, Tosiyuki Noguti 2 and Mitiko Go1 1 Division of Biological Science, Graduate School of Science, Nagoya University, 2 Saga Medical School [email protected] One of the important issues in homology modeling is to obtain accurate sequence alignment. Particularly it is true in the case of low sequence identity (less than 30 %) between the target and template proteins. In low sequence identity, one of the difficulties lies in locating the insertions/deletions (in/del) at proper positions. To accommodate the in/del at correct locations, we developed a new sequence alignment method for protein pairs with weak identity in their amino acid sequences. A new gap penalty function was introduced that is based on the solvent accessibility of the corresponding amino acid residues of the template structure. In the new sequence alignment method, the gap penalty function and the Position Specific Scoring Matrix (PSSM) of PSI-BLAST [1] were combined. This alignment method we developed is named ALAX (ALignment based on ACCessibility). We used ALAX for template-target sequence alignment and homology modeling software FAMS in CASP5/CAFASP3. In CASP5/CAFASP3, we obtained the target models through the following three steps. 1) Template structure selection To identify a template structure, we used five iterations of PSI-BLAST against the non-redundant protein sequence database (nr) of the NCBI. All the sequences having an e-value lower than 0.1 were included in the PSSM construction. Then, the PSSM was used to search against the PDB sequence database. One PDB sequence with the lowest e-value was selected as a template structure. 2) Target – template sequence alignment To align the template and the target sequence, we used ALAX with solvent accessibility of residues of the template structure and the PSSM constructed in the step 1). A-6 3) Model building The model building was carried out finally by using FAMS [2] program according to the alignment that was obtained by ALAX. All the processes of homology modeling, 1) to 3) are fully automatic. 1. Altschul S.F. et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res.