Development of Novel Strategies for Template-Based Protein Structure Prediction
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Predicting and Characterising Protein-Protein Complexes
Predicting and Characterising Protein-Protein Complexes Iain Hervé Moal June 2011 Biomolecular Modelling Laboratory, Cancer Research UK London Research Institute and Department of Biochemistry and Molecular Biology, University College London A thesis submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Biochemistry at the University College London. 2 I, Iain Hervé Moal, confirm that the work presented in this thesis is my own. Where information has been derived from other sources, I confirm that this has been indicated in the thesis. Abstract Macromolecular interactions play a key role in all life processes. The con- struction and annotation of protein interaction networks is pivotal for the understanding of these processes, and how their perturbation leads to dis- ease. However the extent of the human interactome and the limitations of the experimental techniques which can be brought to bear upon it necessit- ate theoretical approaches. Presented here are computational investigations into the interactions between biological macromolecules, focusing on the structural prediction of interactions, docking, and their kinetic and thermo- dynamic characterisation via empirical functions. Firstly, the use of normal modes in docking is investigated. Vibrational analysis of proteins are shown to indicate the motions which proteins are intrinsically disposed to under- take, and the use of this information to model flexible deformations upon protein-protein binding is evaluated. Subsequently SwarmDock, a docking algorithm which models flexibility as a linear combination of normal modes, is presented and benchmarked on a wide variety of test cases. This algorithm utilises state of the art energy functions and metaheuristics to navigate the free energy landscape. -
Functional Effects Detailed Research Plan
GeCIP Detailed Research Plan Form Background The Genomics England Clinical Interpretation Partnership (GeCIP) brings together researchers, clinicians and trainees from both academia and the NHS to analyse, refine and make new discoveries from the data from the 100,000 Genomes Project. The aims of the partnerships are: 1. To optimise: • clinical data and sample collection • clinical reporting • data validation and interpretation. 2. To improve understanding of the implications of genomic findings and improve the accuracy and reliability of information fed back to patients. To add to knowledge of the genetic basis of disease. 3. To provide a sustainable thriving training environment. The initial wave of GeCIP domains was announced in June 2015 following a first round of applications in January 2015. On the 18th June 2015 we invited the inaugurated GeCIP domains to develop more detailed research plans working closely with Genomics England. These will be used to ensure that the plans are complimentary and add real value across the GeCIP portfolio and address the aims and objectives of the 100,000 Genomes Project. They will be shared with the MRC, Wellcome Trust, NIHR and Cancer Research UK as existing members of the GeCIP Board to give advance warning and manage funding requests to maximise the funds available to each domain. However, formal applications will then be required to be submitted to individual funders. They will allow Genomics England to plan shared core analyses and the required research and computing infrastructure to support the proposed research. They will also form the basis of assessment by the Project’s Access Review Committee, to permit access to data. -
Centre for Bioinformatics Imperial College London
Centre for Bioinformatics Imperial College London Inaugural Report January 2003 Centre Director: Prof Michael Sternberg www.imperial.ac.uk/bioinformatics [email protected] Support Service Head: Dr Sarah Butcher www.codon.bioinformatics.ic.ac.uk [email protected] Centre for Bioinformatics - Imperial College London - Inaugural Report - January 2003 1 Contents Summary..................................................................................................................... 3 1. Background and Objectives .................................................................................... 4 1.1 Background ....................................................................................................... 4 1.2 Objectives of the Centre for Bioinformatics ....................................................... 4 1.3 Objectives of the Bioinformatics Support Service.............................................. 5 1.4 Historical Notes ................................................................................................. 5 2. Management ........................................................................................................... 6 3. Biographies of the Team......................................................................................... 7 4. Bioinformatics Research at Imperial ....................................................................... 8 4.1 Affiliates of the Centre ....................................................................................... 8 4.2 Research Grants -
Centre for Bioinformatics Imperial College London Second Report 31
Centre for Bioinformatics Imperial College London Second Report 31 May 2004 Centre Director: Prof Michael Sternberg www.imperial.ac.uk/bioinformatics [email protected] Support Service Head: Dr Sarah Butcher www.codon.bioinformatics.ic.ac.uk [email protected] Centre for Bioinformatics - Imperial College London - Second Report - May 2004 1 Contents Summary..................................................................................................................... 3 1. Background and Objectives .................................................................................. 4 1.1 Objectives of the Report........................................................................................ 4 1.2 Background........................................................................................................... 4 1.3 Objectives of the Centre for Bioinformatics........................................................... 5 1.4 Objectives of the Bioinformatics Support Service ................................................. 5 2. Management ......................................................................................................... 6 3. Biographies of the Team....................................................................................... 7 4. Bioinformatics Research at Imperial ..................................................................... 8 4.1 Affiliates of the Centre........................................................................................... 8 4.2 Research.............................................................................................................. -
Spinout Equinox Pharma Speeds up and Reduces the Cost of Drug Discovery
Spinout Equinox Pharma speeds up and reduces the cost of drug discovery pinout Equinox Pharma1 provides a service to the pharma and biopharma industries that helps in the development of new drugs. The company, established by researchers from Imperial College London in IMPACT SUMMARY 2008, is using its own proprietary software to help speed up and reduce the cost of discovery processes. S Spinout Equinox Pharma provides services to the agri- tech and pharmaceuticals industries to accelerate the Equinox is based on BBSRC-funded collaborative research The technology also has applications in the agrochemical discovery of molecules that could form the basis of new conducted by Professors Stephen Muggleton2, Royal sector, and is being used by global agri-tech company products such as drugs or herbicides. Academy Chair in Machine Learning, Mike Sternberg3, Chair Syngenta. The company arose from BBSRC bioinformatics and of Structural Bioinformatics, and Paul Freemont4, Chair of structural biology research at Imperial College London. Protein Crystallography, at Imperial College London. “The power of a logic-based approach is that it can Technology commercialisation company NetScientific propose chemically-novel molecules that can have Ltd has invested £250K in Equinox. The technology developed by the company takes a logic- enhanced properties and are able to be the subject of novel Equinox customers include major pharmaceutical based approach to discover novel drugs, using computers to ‘composition of matter’ patents,” says Muggleton. companies such as Japanese pharmaceutical company learn from a set of biologically-active molecules. It combines Astellas and agro chemical companies such as knowledge about traditional drug discovery methods and Proving the technology Syngenta, as well as SMEs based in the UK, Europe, computer-based analyses, and focuses on the key stages in A three-year BBSRC grant in 2003 funded the work of Japan and the USA. -
The Phyre2 Web Portal for Protein Modeling, Prediction and Analysis
PROTOCOL The Phyre2 web portal for protein modeling, prediction and analysis Lawrence A Kelley1, Stefans Mezulis1, Christopher M Yates1,2, Mark N Wass1,2 & Michael J E Sternberg1 1Structural Bioinformatics Group, Imperial College London, London, UK. 2Present addresses: University College London (UCL) Cancer Institute, London, UK (C.M.Y.); Centre for Molecular Processing, School of Biosciences, University of Kent, Kent, UK (M.N.W.). Correspondence should be addressed to L.A.K. ([email protected]). Published online 7 May 2015; doi:10.1038/nprot.2015.053 Phyre2 is a suite of tools available on the web to predict and analyze protein structure, function and mutations. The focus of Phyre2 is to provide biologists with a simple and intuitive interface to state-of-the-art protein bioinformatics tools. Phyre2 replaces Phyre, the original version of the server for which we previously published a paper in Nature Protocols. In this updated protocol, we describe Phyre2, which uses advanced remote homology detection methods to build 3D models, predict ligand binding sites and analyze the effect of amino acid variants (e.g., nonsynonymous SNPs (nsSNPs)) for a user’s protein sequence. Users are guided through results by a simple interface at a level of detail they determine. This protocol will guide users from submitting a protein sequence to interpreting the secondary and tertiary structure of their models, their domain composition and model quality. A range of additional available tools is described to find a protein structure in a genome, to submit large number of sequences at once and to automatically run weekly searches for proteins that are difficult to model. -
Statistical Inference for Template-Based Protein Structure
Statistical Inference for template-based protein structure prediction by Jian Peng Submitted to: Toyota Technological Institute at Chicago 6045 S. Kenwood Ave, Chicago, IL, 60637 For the degree of Doctor of Philosophy in Computer Science Thesis Committee: Jinbo Xu (Thesis Supervisor) David McAllester Daisuke Kihara Statistical Inference for template-based protein structure prediction by Jian Peng Submitted to: Toyota Technological Institute at Chicago 6045 S. Kenwood Ave, Chicago, IL, 60637 May 2013 For the degree of Doctor of Philosophy in Computer Science Thesis Committee: Jinbo Xu (Thesis Supervisor) Signature: Date: David McAllester Signature: Date: Daisuke Kihara Signature: Date: Abstract Protein structure prediction is one of the most important problems in computational biology. The most successful computational approach, also called template-based modeling, identifies templates with solved crystal structures for the query proteins and constructs three dimensional models based on sequence/structure alignments. Although substantial effort has been made to improve protein sequence alignment, the accuracy of alignments between distantly related proteins is still unsatisfactory. In this thesis, I will introduce a number of statistical machine learning methods to build accurate alignments between a protein sequence and its template structures, especially for proteins having only distantly related templates. For a protein with only one good template, we develop a regression-tree based Conditional Random Fields (CRF) model for pairwise protein sequence/structure alignment. By learning a nonlinear threading scoring function, we are able to leverage the correlation among different sequence and structural features. We also introduce an information-theoretic measure to guide the learning algorithm to better exploit the structural features for low-homology proteins with little evolutionary information in their sequence profile. -
Deep Learning-Based Advances in Protein Structure Prediction
International Journal of Molecular Sciences Review Deep Learning-Based Advances in Protein Structure Prediction Subash C. Pakhrin 1,†, Bikash Shrestha 2,†, Badri Adhikari 2,* and Dukka B. KC 1,* 1 Department of Electrical Engineering and Computer Science, Wichita State University, Wichita, KS 67260, USA; [email protected] 2 Department of Computer Science, University of Missouri-St. Louis, St. Louis, MO 63121, USA; [email protected] * Correspondence: [email protected] (B.A.); [email protected] (D.B.K.) † Both the authors should be considered equal first authors. Abstract: Obtaining an accurate description of protein structure is a fundamental step toward under- standing the underpinning of biology. Although recent advances in experimental approaches have greatly enhanced our capabilities to experimentally determine protein structures, the gap between the number of protein sequences and known protein structures is ever increasing. Computational protein structure prediction is one of the ways to fill this gap. Recently, the protein structure prediction field has witnessed a lot of advances due to Deep Learning (DL)-based approaches as evidenced by the suc- cess of AlphaFold2 in the most recent Critical Assessment of protein Structure Prediction (CASP14). In this article, we highlight important milestones and progresses in the field of protein structure prediction due to DL-based methods as observed in CASP experiments. We describe advances in various steps of protein structure prediction pipeline viz. protein contact map prediction, protein distogram prediction, protein real-valued distance prediction, and Quality Assessment/refinement. We also highlight some end-to-end DL-based approaches for protein structure prediction approaches. Additionally, as there have been some recent DL-based advances in protein structure determination Citation: Pakhrin, S.C.; Shrestha, B.; using Cryo-Electron (Cryo-EM) microscopy based, we also highlight some of the important progress Adhikari, B.; KC, D.B. -
Raptorx: Exploiting Structure Information for Protein Alignment by Statistical Inference Jian Peng and Jinbo Xu*
proteins STRUCTURE O FUNCTION O BIOINFORMATICS Prediction Methods and Reports RaptorX: Exploiting structure information for protein alignment by statistical inference Jian Peng and Jinbo Xu* Toyota Technological Institute at Chicago, Chicago, Illinois ABSTRACT INTRODUCTION This work presents RaptorX, a sta- RaptorX is totally different from our previous threading program RAPTOR, tistical method for template-based which aligns a sequence to a template by using linear programming to minimize a protein modeling that improves given threading scoring function.1,2 By contrast, RaptorX uses a statistical learning alignment accuracy by exploiting method to design a new threading scoring function, aiming at better measuring the structural information in a single compatibility between a target sequence and a template structure. In addition to or multiple templates. RaptorX single-template threading, RaptorX also has a multiple-template threading compo- consists of three major compo- nent and contains a new module for alignment quality prediction. Our results show nents: single-template threading, 3,4 alignment quality prediction, and that RaptorX indeed has much better alignment accuracy than RAPTOR. multiple-template threading. This RaptorX is designed to address two ‘‘alignment’’ challenges facing template- work summarizes the methods used based protein modeling. One is how to align a target to its template when they by RaptorX and presents its CASP9 have a sparse sequence profile (i.e., no sufficient amount of information in result analysis, aiming to identify homologs). In this case, a profile-based alignment method may not work well. major bottlenecks with RaptorX The other is how to improve sequence-template alignment accuracy using more and template-based modeling and reliable template structural alignments as bridge when at least two similar tem- hopefully directions for further plates are available for a target. -
Distance-Based Protein Folding Powered by Deep Learning Jinbo Xu Toyota Technological Institute at Chicago 6045 S Kenwood, IL, 60637, USA [email protected]
Distance-based Protein Folding Powered by Deep Learning Jinbo Xu Toyota Technological Institute at Chicago 6045 S Kenwood, IL, 60637, USA [email protected] Contact-assisted protein folding has made very good progress, but two challenges remain. One is accurate contact prediction for proteins lack of many sequence homologs and the other is that time-consuming folding simulation is often needed to predict good 3D models from predicted contacts. We show that protein distance matrix can be predicted well by deep learning and then directly used to construct 3D models without folding simulation at all. Using distance geometry to construct 3D models from our predicted distance matrices, we successfully folded 21 of the 37 CASP12 hard targets with a median family size of 58 effective sequence homologs within 4 hours on a Linux computer of 20 CPUs. In contrast, contacts predicted by direct coupling analysis (DCA) cannot fold any of them in the absence of folding simulation and the best CASP12 group folded 11 of them by integrating predicted contacts into complex, fragment- based folding simulation. The rigorous experimental validation on 15 CASP13 targets show that among the 3 hardest targets of new fold our distance-based folding servers successfully folded 2 large ones with <150 sequence homologs while the other servers failed on all three, and that our ab initio folding server also predicted the best, high-quality 3D model for a large homology modeling target. Further experimental validation in CAMEO shows that our ab initio folding server predicted correct fold for a membrane protein of new fold with 200 residues and 229 sequence homologs while all the other servers failed. -
Virtual Screening of Human Class-A Gpcrs Using Ligand Profiles Built on Multiple Ligand-Receptor Interactions
Article Virtual Screening of Human Class-A GPCRs Using Ligand Profiles Built on Multiple Ligand–Receptor Interactions Wallace K.B. Chan 1,2 and Yang Zhang 1,3 1 - Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA 2 - Department of Pharmacology, University of Michigan, Ann Arbor, MI 48109, USA 3 - Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA Correspondence to Yang Zhang: Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, MI 48109-2218, USA. [email protected] https://doi.org/10.1016/j.jmb.2020.07.003 Edited by Michael Sternberg Abstract G protein-coupled receptors (GPCRs) are a large family of integral membrane proteins responsible for cellular signal transductions. Identification of therapeutic compounds to regulate physiological processes is an important first step of drug discovery. We proposed MAGELLAN, a novel hierarchical virtual-screening (VS) pipeline, which starts with low-resolution protein structure prediction and structure-based binding-site identification, followed by homologous GPCR detections through structure and orthosteric binding-site comparisons. Ligand profiles constructed from the homologous ligand–GPCR complexes are then used to thread through compound databases for VS. The pipeline was first tested in a large-scale retrospective screening experiment against 224 human Class A GPCRs, where MAGELLAN achieved a median enrichment factor (EF) of 14.38, significantly higher than that using individual ligand profiles. Next, MAGELLAN was examined on 5 and 20 GPCRs from two public VS databases (DUD-E and GPCR-Bench) and resulted in an average EF of 9.75 and 13.70, respectively, which compare favorably with other state-of- the-art docking- and ligand-based methods, including AutoDock Vina (with EF = 1.48/3.16 in DUD-E and GPCR-Bench), DOCK 6 (2.12/3.47 in DUD-E and GPCR-Bench), PoLi (2.2 in DUD-E), and FINDSITEC- comb2.0 (2.90 in DUD-E). -
Template-Based Protein Modeling Using the Raptorx Web Server
PROTOCOL Template-based protein structure modeling using the RaptorX web server Morten Källberg1–3, Haipeng Wang1,3, Sheng Wang1, Jian Peng1, Zhiyong Wang1, Hui Lu2 & Jinbo Xu1 1Toyota Technological Institute at Chicago, Chicago, Illinois, USA. 2Department of Bioengineering, University of Illinois at Chicago, Chicago, Illinois, USA. 3These authors contributed equally to this work. Correspondence should be addressed to J.X. ([email protected]). Published online 19 July 2012; doi:10.1038/nprot.2012.085 A key challenge of modern biology is to uncover the functional role of the protein entities that compose cellular proteomes. To this end, the availability of reliable three-dimensional atomic models of proteins is often crucial. This protocol presents a community-wide web-based method using RaptorX (http://raptorx.uchicago.edu/) for protein secondary structure prediction, template-based tertiary structure modeling, alignment quality assessment and sophisticated probabilistic alignment sampling. RaptorX distinguishes itself from other servers by the quality of the alignment between a target sequence and one or multiple distantly related template proteins (especially those with sparse sequence profiles) and by a novel nonlinear scoring function and a probabilistic-consistency algorithm. Consequently, RaptorX delivers high-quality structural models for many targets with only remote templates. At present, it takes RaptorX ~35 min to finish processing a sequence of 200 amino acids. Since its official release in August 2011, RaptorX has processed ~6,000 sequences submitted by ~1,600 users from around the world. INTRODUCTION Proteomes constitute the backbone of cellular function by carrying used by threading methods8,9 such as MUSTER10, SPARKS11,12 and out the tasks encoded in the genes expressed by a given cell type.