Scoring Function for Automated Assessment of Protein Structure Template Quality

Total Page:16

File Type:pdf, Size:1020Kb

Scoring Function for Automated Assessment of Protein Structure Template Quality PROTEINS: Structure, Function, and Bioinformatics 57:702–710 (2004) Scoring Function for Automated Assessment of Protein Structure Template Quality Yang Zhang and Jeffrey Skolnick* Center of Excellence in Bioinformatics, University at Buffalo, Buffalo, New York ABSTRACT We have developed a new scoring aligned, the resulting full-length model might be of poorer function, the template modeling score (TM-score), to quality. The template assessment problem becomes particu- assess the quality of protein structure templates larly relevant during the development of efficient fold and predicted full-length models by extending the recognition algorithms, since different sequence–structure approaches used in Global Distance Test (GDT)1 and alignment schemes or parameters can result in various MaxSub.2 First, a protein size-dependent scale is levels of alignment confidence with an associated loss or exploited to eliminate the inherent protein size gain of alignment coverage.3–6 Therefore, a single assess- dependence of the previous scores and appropri- ment score that has an appropriate balance of alignment ately account for random protein structure pairs. accuracy and coverage and that is strongly related to the Second, rather than setting specific distance cutoffs quality of the final full-length model is essential. Equally and calculating only the fractions with errors below important, it must differentiate between a random and a the cutoff, all residue pairs in alignment/modeling statistically significant prediction. are evaluated in the proposed score. For compari- Highly related to the above problems, several interest- son of various scoring functions, we have con- ing scoring functions have been developed for the purpose structed a large-scale benchmark set of structure of sequence-dependent comparison of two structures of templates for 1489 small to medium size proteins different lengths (in contrast to sequence-independent using the threading program PROSPECTOR_3 and structure alignment algorithms).1,2,13–15 For example, with built the full-length models using MODELLER and MaxSub,2 Siew and coworkers tried to identify the maxi- TASSER. The TM-score of the initial threading align- mum substructure in which the distances between equiva- ments, compared to the GDT and MaxSub scoring lent residues of two structures after superposition are functions, shows a much stronger correlation to the below some threshold value, such as 3.5 Å. Since the quality of the final full-length models. The TM-score MaxSub scoring function only counts those residues in- is further exploited as an assessment of all ‘new fold’ cluded in the substructure, the spatial information of the targets in the recent CASP5 experiment and shows a templates outside the substructure is omitted. For ex- close coincidence with the results of human-expert ample, Figure 1(a) shows the MaxSub superposition of the visual assessment. These data suggest that the TM- native structure of 2sas_ and the template alignment (94% score is a useful complement to the fully automated coverage, which is the ratio of the number of aligned assessment of protein structure predictions. The residues to the number of target residues) obtained from executable program of TM-score is freely download- the threading program PROSPECTOR_3,6 where residue able at http://bioinformatics.buffalo.edu/TM-score. pairs of distance Ͻ3.5 Å are highlighted in red (50% Proteins 2004;57:702–710. © 2004 Wiley-Liss, Inc. coverage), with the remainder of the aligned residues in INTRODUCTION yellow. The templates in Figure 1(b) (original alignment having 94% coverage) and Figure 1(c) (the ‘well-aligned’ The canonical comparative modeling/threading-based part with 50% coverage) therefore have the same MaxSub- protein structure prediction procedure consists of two score, which is only associated with the set of red residues. steps: (i) finding a solved structure related to the target However, the facility of the templates for the final full- 3–6 sequence (i.e. template) and (ii) building a full-length length structure modeling can be significantly different. 7–10 model based on the template. The quality of the Using the structure building program MODELLER,7,10 for resulting full-length model is usually assessed by the root example, the template in Figure 1(b) results in a full- 11,12 mean square deviation (RMSD) between equivalent length model with a RMSD from native of 4.4 Å, while the atoms in the model and native structures. However, template in Figure 1(c) results in a full-length model RMSD alone is not sufficient to estimate the quality of the initial templates because the alignment coverage can be very different in different approaches.3–6 Obviously, a *Correspondence to: Jeffrey Skolnick, Center of Excellence in template witha2ÅRMSD to native having 50% align- Bioinformatics, University at Buffalo, 901 Washington St., Buffalo, NY 14203. E-mail: [email protected]. ment coverage is not necessarily better for structure Received 30 December 2003; Revised 16 March 2004, 25 May 2004; modeling than one with a RMSD of 3 Å but having 80% Accepted 9 June 2004 alignment coverage. While the template aligned regions Published online 8 October 2004 in Wiley InterScience are better in the former because fewer residues are (www.interscience.wiley.com). DOI: 10.1002/prot.20264 © 2004 WILEY-LISS, INC. SCORING FUNCTION FOR TEMPLATE QUALITY 703 Fig. 1. (a) Superposition between the template and the native structure of 2sas from MaxSub. The native structure is shown as thin backbone in white; the template is shown as thick backbone in yellow, which is from Chain-A of 2scp in the PDB library17 hit by PROSPECTOR_3.6 The residues with pairwise distances lower than 3.5 Å are highlighted in red for both template and native. (b) The whole template alignment has 94% coverage. (c) The substructure of the template alignment with a distance to native of less than 3.5 Å (‘well-aligned’ part) has 50% coverage. The templates in (b) and (c) have the same MaxSub score of 0.434. However, using MODELLER,7,10 the template in (b) results in a full-length model of RMSD 4.4 Å, while the template in (c) results in a full-length model of RMSD 12.5 Å. having a RMSD from native of 12.5 Å. By way of further illustration, in a large-scale benchmark set of PROSPEC- TOR_3 alignments (see below), there are 81 cases having MaxSub-scores between 0.4 and 0.45. The RMSD values of the final full-length models built from MODELLER vary from 3.5 to 35.7 Å with a standard deviation of 4.8 Å. Thus, there is no apparent correlation between the Maxsub-score and the quality of the resulting full-length model. In their GDT_TS scoring function, Zemla and cowork- ers1,13 further identify multiple maximum substructures associated with several different threshold cutoffs (e.g. 1, 2,4,and8Åasused in the recent CASP5 experiment16). The GDT_TS-score is defined as the average coverage of the target sequence of the substructures with the four different distance thresholds. Since the GDT_TS-score focuses only on the size of the substructures, the detailed match information of templates/models and native struc- tures is partially missed (e.g. residues with deviations ranging from 4.1–8 Å from native have identical contribu- tions to the scoring function). Zemla13 further addressed Fig. 2. The average MaxSub-scores (triangles) and GDT-scores (circles) as a function of protein size. The data are calculated from all pairs this problem by introducing more distance thresholds. of 3656 proteins taken from PDB library17 that have Ͻ30% pairwise Another problem associated with these score functions is sequence identity. The statistical error bars are smaller than the size of the dependence of the score magnitudes on the evaluated the points. The solid and dashed lines are the nonlinear least square Marquardt–Levenberg fit of the MaxSub and GDT data to a power-law proteins’ size. In other words, one must address the issue equation of f(L) ϭ aLb, where L is the length of the smaller protein of the of what the corresponding score value of a pair of randomly corresponding structure pair. The fit parameters a and b are as indicated. related structures would be. In Figure 2, we plot the average MaxSub- and GDT-scores as a function of the length of protein for random structure pairs in the Protein been observed by many authors when structure similarity Data Bank (PDB)17 that have pairwise sequence identity is measured by structure alignment18,19 or RMSD calcula- of less than 30%. These scores show a power-law depen- tion.20,21 To eliminate the dependence on protein size, dence on the protein size. Obviously, a given absolute Levitt and Gerstein18 and Ortiz and coworkers19 con- score, such as GDT ϭ 0.4 or MaxSub ϭ 0.3, can reflect a verted their structure alignment score into a statistical significant alignment for a target of 400 residues, but it is significance score, called the P-value, on the basis of the close to a random selection in the PDB for a target of 40 statistics of their random structure database. For their residues. This significant size dependence renders the relative RMSD, Betancourt and Skolnick20 normalized the absolute magnitude of these scoring functions meaning- RMSD by the average RMSD from random structure pairs less. with similar size and radii of gyration. In the RMSD-100 The significant protein size dependence of structure score,21 Carugo and Pongor divided the RMSD by a factor similarity for randomly related structure pairs has also of 1 ϩ͌N/100, with N representing the protein length. 704 Y. ZHANG AND J. SKOLNICK In this article, we extend the above approaches and develop a new scoring function for the assessment of threading templates, which we call the template modeling (TM) score.
Recommended publications
  • 2011 Annual Report of the Georgia Research Alliance Not Every Day Brings a Major Win, but in 2011, Many Did
    2011 AnnuAl report of the GeorGiA reseArch AlliAnce not every day brings a major win, but in 2011, many did. An advance in science, a venture investment, the growth of a company – these kinds of developments, all connected in some way to the catalytic efforts of the Georgia research Alliance, signaled that Georgia’s technology-rich economy is alive and growing. And that’s good news. In GRA, Georgia has a game Beginning in July, GRA partnered These and other 2011 events Competition among states for plan for building the science- with the Georgia Department and activities represent a economic development and job and technology-driven of Economic Development to foundation for future creation is as intense as ever. economy of tomorrow. And determine how the state’s seven advances, investments and Every advantage matters. GRA in 2011, that game plan was Centers of Innovation, which economic growth. As such, bolsters Georgia’s competitive strengthened when newly help grow strategic industries, they are a reminder that it advantage by expanding elected Governor Nathan Deal can maximize their potential. pays to make every day count. university R&D capacity and moved to align more closely And the Georgia Cancer adding the fuel needed to several of Georgia’s greatest Coalition, the state’s signature launch new enterprises. assets for creating, expanding initiative for cancer research and attracting companies and care, was moved under the that foster high-wage jobs. GRA umbrella. the events of 2011 | JAnuArY – feBruArY Jan 1 Expert in autism named GRA Eminent Scholar Jan 11 Collaboration aims to develop new treatment for ovarian cancer Georgia Tech researchers have Meg Buscema proposed a filtration system that could potentially remove Billy Howard Jan 4 free-floating cancer cells that cause secondary tumors.
    [Show full text]
  • For Ligand- Binding Site Prediction and Functional Annotation
    A threading-based method (FINDSITE) for ligand- binding site prediction and functional annotation Michal Brylinski and Jeffrey Skolnick* Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, 250 14th Street NW, Atlanta, GA 30318 Edited by Harold A. Scheraga, Cornell University, Ithaca, NY, and approved November 19, 2007 (received for review August 15, 2007) The detection of ligand-binding sites is often the starting point for Connolly surface (21) and then reranks the identified pockets by protein function identification and drug discovery. Because of the degree of conservation of identified surface residues. inaccuracies in predicted protein structures, extant binding pocket- A systematic analysis of known protein structures grouped detection methods are limited to experimentally solved structures. according to SCOP (22) reveals a general tendency of certain Here, FINDSITE, a method for ligand-binding site prediction and protein folds to bind substrates at a similar location, suggesting functional annotation based on binding-site similarity across that analogous or very distantly homologous proteins can have groups of weakly homologous template structures identified from common binding sites (11). If so, it should be possible to develop threading, is described. For crystal structures, considering a cutoff an approach for ligand-binding site identification that is less distance of4Åasthehitcriterion, the success rate is 70.9% for sensitive than pocket-detection methods to distortions in the identifying
    [Show full text]
  • Daisuke Kihara, Ph.D. Professor of Biological Sciences and Computer Science Purdue University
    Updated on June 21, 2020 Daisuke Kihara, Ph.D. Professor of Biological Sciences and Computer Science Purdue University Contact Information Purdue University Department of Biological Sciences and Computer Science West Lafayette, IN, 47907 Office: Hockmyer 229 Tel: (765) 496-2284 E-mail: [email protected] https://www.bio.purdue.edu/People/faculty_dm/directory.php?refID=166 https://www.cs.purdue.edu/people/faculty/dkihara/ http://kiharalab.org (Lab) Education 1999 Ph.D. (Science) in Bioinformatics Kyoto University, Faculty of Science, Japan, Advisor: Minoru Kanehisa 1996 M.S. in Bioinformatics Kyoto University, Faculty of Science, Japan 1994 B.S. in Interdisciplinary Science The University of Tokyo, College of Arts and Sciences, Japan Positions held 2019.3-present Full member, Purdue University Center for Cancer Research 2014.8-present Full Professor 2018.3-present Adjunct Professor, University of Cincinnati, Department of Pediatrics 2015.1-2015.8 Visiting Scientist, Eli Lilly, Indianapolis 2009.8-2014.8 Associate professor 2003.8-2009.8 tenure-track Assistant professor Purdue University, West Lafayette, Indiana Department of Biological Sciences/Computer Sciences (joint appointment) 2002.9-2003.7 Senior Postdoctoral Research Associate Advisor: Jeffrey Skolnick Buffalo Center of Excellence in Bioinformatics, Buffalo, NY, USA 1999-2002.9 Postdoctoral Research Associate Advisor: Jeffrey Skolnick Donald Danforth Plant Science Center, St. Louis, MO, USA 1998-1999 Research Assistant Advisor: Minoru Kanehisa Bioinformatics Center, Institute for
    [Show full text]
  • Jacquelyn S. Fetrow
    Jacquelyn S. Fetrow President and Professor of Chemistry Albright College Curriculum Vitae Office of the President Work Email: [email protected] Library and Administration Building Office Phone: 610-921-7600 N. 13th and Bern Streets, P.O. Box 15234 Reading, PA 19612 Education Ph.D. Biological Chemistry, December, 1986 B.S. Biochemistry, May, 1982 Department of Biological Chemistry Albright College, Reading, PA The Pennsylvania State University College of Medicine, Hershey, PA Graduated summa cum laude Loops: A Novel Class of Protein Secondary Structure Thesis Advisor: George D. Rose Professional Experience Albright College, Reading PA President and Professor of Chemistry June 2017-present University of Richmond, Richmond, VA Provost and Vice President of Academic Affairs July 2014-December 2016 Professor of Chemistry July 2014-May 2017 Responsibilities as Provost: Chief academic administrator for all academic matters for the University of Richmond, a university with five schools (Arts and Sciences, Business, Law, Leadership, and Professional and Continuing Studies), ~400 faculty and ~3300 full-time undergraduate and graduate students; manage the ~$91.8M annual operating budget of the Academic Affairs Division, as well as endowment and gift accounts; oversee Richmond’s Bonner Center of Civic Engagement (engage.richmond.edu), Center for International Education (international.richmond.edu), Registrar (registrar.richmond.edu), Office of Institutional Effectiveness (ifx.richmond.edu), as well as other programs and staff; partner with VP
    [Show full text]
  • Tertiary Structure Predictions on a Comprehensive Benchmark of Medium to Large Size Proteins
    Biophysical Journal Volume 87 October 2004 2647–2655 2647 Tertiary Structure Predictions on a Comprehensive Benchmark of Medium to Large Size Proteins Yang Zhang and Jeffrey Skolnick Center of Excellence in Bioinformatics, University at Buffalo, Buffalo, New York 14203 ABSTRACT We evaluate tertiary structure predictions on medium to large size proteins by TASSER, a new algorithm that assembles protein structures through rearranging the rigid fragments from threading templates guided by a reduced Ca and side-chain based potential consistent with threading based tertiary restraints. Predictions were generated for 745 proteins 201– 300 residues in length that cover the Protein Data Bank (PDB) at the level of 35% sequence identity. With homologous proteins excluded, in 365 cases, the templates identified by our threading program, PROSPECTOR_3, have a root-mean-square deviation (RMSD) to native , 6.5 A˚ , with .70% alignment coverage. After TASSER assembly, in 408 cases the best of the top five full-length models has a RMSD , 6.5 A˚ . Among the 745 targets are 18 membrane proteins, with one-third having a predicted RMSD , 5.5 A˚ . For all representative proteins less than or equal to 300 residues that have corresponding multiple NMR structures in the Protein Data Bank, 20% of the models generated by TASSER are closer to the NMR structure centroid than the farthest individual NMR model. These results suggest that reasonable structure predictions for nonhomologous large size proteins can be automatically generated on a proteomic scale, and the application of this approach to structural as well as functional genomics represent promising applications of TASSER. INTRODUCTION The protein structure prediction problem, that is, deducing a RMSD , 7A˚ in the top five models (Simons et al., 2001).
    [Show full text]
  • Estimation of Uncertainties in the Global Distance Test (GDT TS) for CASP Models
    RESEARCH ARTICLE Estimation of Uncertainties in the Global Distance Test (GDT_TS) for CASP Models Wenlin Li2, R. Dustin Schaeffer1, Zbyszek Otwinowski2, Nick V. Grishin1,2* 1 Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, 75390–9050, United States of America, 2 Department of Biochemistry and Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, 75390–9050, United States of America * [email protected] Abstract a11111 The Critical Assessment of techniques for protein Structure Prediction (or CASP) is a com- munity-wide blind test experiment to reveal the best accomplishments of structure model- ing. Assessors have been using the Global Distance Test (GDT_TS) measure to quantify prediction performance since CASP3 in 1998. However, identifying significant score differ- ences between close models is difficult because of the lack of uncertainty estimations for OPEN ACCESS this measure. Here, we utilized the atomic fluctuations caused by structure flexibility to esti- Citation: Li W, Schaeffer RD, Otwinowski Z, Grishin mate the uncertainty of GDT_TS scores. Structures determined by nuclear magnetic reso- NV (2016) Estimation of Uncertainties in the Global nance are deposited as ensembles of alternative conformers that reflect the structural Distance Test (GDT_TS) for CASP Models. PLoS flexibility, whereas standard X-ray refinement produces the static structure averaged over ONE 11(5): e0154786. doi:10.1371/journal. time and space for the dynamic ensembles. To recapitulate the structural heterogeneous pone.0154786 ensemble in the crystal lattice, we performed time-averaged refinement for X-ray datasets Editor: Yang Zhang, University of Michigan, UNITED to generate structural ensembles for our GDT_TS uncertainty analysis.
    [Show full text]
  • Methods for the Refinement of Protein Structure 3D Models
    International Journal of Molecular Sciences Review Methods for the Refinement of Protein Structure 3D Models Recep Adiyaman and Liam James McGuffin * School of Biological Sciences, University of Reading, Reading RG6 6AS, UK; [email protected] * Correspondence: l.j.mcguffi[email protected]; Tel.: +44-0-118-378-6332 Received: 2 April 2019; Accepted: 7 May 2019; Published: 1 May 2019 Abstract: The refinement of predicted 3D protein models is crucial in bringing them closer towards experimental accuracy for further computational studies. Refinement approaches can be divided into two main stages: The sampling and scoring stages. Sampling strategies, such as the popular Molecular Dynamics (MD)-based protocols, aim to generate improved 3D models. However, generating 3D models that are closer to the native structure than the initial model remains challenging, as structural deviations from the native basin can be encountered due to force-field inaccuracies. Therefore, different restraint strategies have been applied in order to avoid deviations away from the native structure. For example, the accurate prediction of local errors and/or contacts in the initial models can be used to guide restraints. MD-based protocols, using physics-based force fields and smart restraints, have made significant progress towards a more consistent refinement of 3D models. The scoring stage, including energy functions and Model Quality Assessment Programs (MQAPs) are also used to discriminate near-native conformations from non-native conformations. Nevertheless, there are often very small differences among generated 3D models in refinement pipelines, which makes model discrimination and selection problematic. For this reason, the identification of the most native-like conformations remains a major challenge.
    [Show full text]
  • Jacquelyn S. Fetrow Executive Resume Office of the President Work Email: [email protected] Library and Administration Building Office Phone: 610-921-7600 N
    Jacquelyn S. Fetrow Executive Resume Office of the President Work Email: [email protected] Library and Administration Building Office Phone: 610-921-7600 N. 13th and Bern Streets, P.O. Box 15234 Reading, PA 19612 Current Position: President and Professor of Chemistry June 2017-present Education Ph.D. Biological Chemistry, Penn State, December, 1986 B.S. Biochemistry, Albright College, May, 1982 Professional Experience University of Richmond, Richmond, VA Provost and Vice President for Academic Affairs July 2014-Dec 2016 Professor of Chemistry July 2014-May 2017 Wake Forest University, Winston-Salem, NC Dean, Wake Forest College January 2009-June 2014 Reynolds Professor of Computational Biophysics August 2003-June 2014 Concurrent appointments: Affiliated Faculty, Wake Forest School of Medicine Cancer Center. Department of Biochemistry, Program in Molecular Medicine, Program in Molecular Genetics and Genomics, School of Biomedical Engineering and Science (SBES, joint program between Virginia Tech and Wake Forest) Director, Graduate Track in Structural and Computational Biophysics 2005-2008 GeneFormatics, Inc., San Diego, CA Co-founder, Chief Scientific Officer and Director May 1999-January, 2003 The Scripps Research Institute, La Jolla, CA Associate Professor, Department of Molecular Biology April 1998-May 1999 Visiting Scientist, Department of Molecular Biology (laboratory of Jeffrey Skolnick) January-December 1997 The University at Albany, SUNY, Albany, NY Associate Professor (with tenure), Department of Biological Sciences September 1995-April 1998 Assistant Professor, Department of Biological Sciences January 1990-August 1995 Concurrent/joint appointments: Department of Chemistry, Department of Biomedical Sciences, School of Public Health, Department of Biomedical Science, School of Public Health Postdoctoral Fellowships The Whitehead Institute for Biomedical Research, MIT, Cambridge, MA (Advisor: Peter S.
    [Show full text]
  • Designing and Evaluating the MULTICOM Protein Local and Global
    Cao et al. BMC Structural Biology 2014, 14:13 http://www.biomedcentral.com/1472-6807/14/13 RESEARCH ARTICLE Open Access Designing and evaluating the MULTICOM protein local and global model quality prediction methods in the CASP10 experiment Renzhi Cao1, Zheng Wang4 and Jianlin Cheng1,2,3* Abstract Background: Protein model quality assessment is an essential component of generating and using protein structural models. During the Tenth Critical Assessment of Techniques for Protein Structure Prediction (CASP10), we developed and tested four automated methods (MULTICOM-REFINE, MULTICOM-CLUSTER, MULTICOM-NOVEL, and MULTICOM- CONSTRUCT) that predicted both local and global quality of protein structural models. Results: MULTICOM-REFINE was a clustering approach that used the average pairwise structural similarity between models to measure the global quality and the average Euclidean distance between a model and several top ranked models to measure the local quality. MULTICOM-CLUSTER and MULTICOM-NOVEL were two new support vector machine-based methods of predicting both the local and global quality of a single protein model. MULTICOM- CONSTRUCT was a new weighted pairwise model comparison (clustering) method that used the weighted average similarity between models in a pool to measure the global model quality. Our experiments showed that the pairwise model assessment methods worked better when a large portion of models in the pool were of good quality, whereas single-model quality assessment methods performed better on some hard targets when only a small portion of models in the pool were of reasonable quality. Conclusions: Since digging out a few good models from a large pool of low-quality models is a major challenge in protein structure prediction, single model quality assessment methods appear to be poised to make important contributions to protein structure modeling.
    [Show full text]
  • Emidio Capriotti Phd CURRICULUM VITÆ
    Emidio Capriotti PhD CURRICULUM VITÆ Name: Emidio Capriotti Nationality: Italian Date of birth: February, 1973 Place of birth: Roma, Italy Languages: Italian, English, Spanish Positions Oct 2019 Associate Professor: Department of Pharmacy and Biotechnology (FaBiT). University of Bologna, Bologna, Italy 2016-2019 Senior Assistant Professor (RTD type B): Department of Pharmacy and Biotechnology (FaBiT) and Department of Biological, Geological, and Environmental Sciences (BiGeA). University of Bologna, Bologna, Italy. 2015-2016 Junior Group Leader: Institute of Mathematical Modeling of Biological Systems, University of Düsseldorf, Düsseldorf, Germany 2012-2015 Assistant Professor: Division of Informatics, Department of Pathology, University of Alabama at Birmingham (UAB), Birmingham (AL), USA. 2011-2012 Marie-Curie IOF: Contracted Researcher at the Department of Mathematics and Computer Science, University of Balearic Islands (UIB), Palma de Mallorca, Spain. 2009-2011 Marie-Curie IOF: Postdoctoral Researcher at the Helix Group, Department of Bioengineering, Stanford University, Stanford (CA), USA. 2006-2009 Postdoctoral Researcher in the Structural Genomics Group at Department of Bioinformatics and Genetics, Prince Felipe Research Center (CIPF) Valencia, Spain. 2004-2006 Contract researcher at Department of Biology, University of Bologna, Bologna, Italy. 2001-2003 Ph.D student in Physical Sciences at University of Bologna, Bologna, Italy. Education Sep 2004 Master in Bioinformatics (first level) University of Bologna, Bologna (Italy). Jun 2004 Ph.D. in Physical Sciences University of Bologna, Bologna (Italy). Jul 1999 Laurea (B.S.) Degree in Physical Sciences, score 106/110 University of Bologna, Bologna (Italy). Visiting Jun 2012 – Jul 2012 Prof. Frederic Rousseau and Prof. Joost Schymkowitz, VIB Switch Laboratory, KU Leuven, Leuven (Belgium) May 2009 Prof.
    [Show full text]
  • Lemedisco: a Computational Method for Large-Scale
    medRxiv preprint doi: https://doi.org/10.1101/2021.06.28.21259559; this version posted July 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license . LeMeDISCO: A computational method for large-scale prediction & molecular interpretation of disease comorbidity Courtney Astore1,, Hongyi Zhou1,, and Jeffrey Skolnick*,1 1Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, 30332, USA *[email protected] These authors contributed equally to this work NOTE: This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice. medRxiv preprint doi: https://doi.org/10.1101/2021.06.28.21259559; this version posted July 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license . Abstract Often different diseases tend to co-occur (i.e., they are comorbid), which yields the question: what is the molecular basis of their coincidence? Perhaps, common proteins are comorbid disease drivers. To understand the origin of disease comorbidity and to identify the essential proteins and pathways underlying comorbid diseases, we developed LeMeDISCO (Large-Scale Molecular Interpretation of Disease Comorbidity), an algorithm that predicts disease comorbidities from shared mode of action (MOA) proteins predicted by the AI-based MEDICASCY algorithm.
    [Show full text]
  • Computational Methods for Evaluation of Protein Structural Models
    Computational Methods for Evaluation of Protein Structural Models by Rahul Alapati A thesis submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree of Master of Science Auburn, Alabama May 4, 2019 Keywords: Protein Structure Comparison, Protein Decoy Clustering, Integration of side-chain orientation and global distance-based measures, CASP13, clustQ, SPECS Copyright 2019 by Rahul Alapati Approved by Debswapna Bhattacharya, Chair, Assistant Professor of Computer Science and Software Engineering Dean Hendrix, Associate Professor of Computer Science and Software Engineering Gerry Dozier, Professor of Computer Science and Software Engineering Abstract Proteins are essential parts of organisms and participate in virtually every process within the cells. The function of a protein is closely related to its structure than to its amino acid sequence. Hence, the study of the protein’s structure can give us valuable information about its functions. Due to the complex and expensive nature of the experimental techniques, computational methods are often the only possibility to obtain structural information of a protein. Major advancements in the field of protein structure prediction have made it possible to generate a large number of models for a given protein in a short amount of time. Hence, to assess the accuracy of any computational protein structure prediction method, evaluation of the similarity between the predicted protein models and the experimentally determined native structure is one of the most important tasks. Existing approaches in model quality assessment suffer from two key challenges: (1) difficulty in efficiently ranking and selecting optimal models from a large number of protein structures (2) lack of a similarity measure that takes into consideration the side-chain orientation along with main chain Carbon alpha (Cα) and Side-Chain (SC) atoms for comparing two protein structures.
    [Show full text]