Alpha Geometry to Describe Protein Secondary Structure and Motifs
Total Page:16
File Type:pdf, Size:1020Kb
Using C-Alpha Geometry to Describe Protein Secondary Structure and Motifs by Christopher Joseph Williams Department of Biochemistry Duke University Date:_______________________ Approved: ___________________________ David C. Richardson, Co-Supervisor ___________________________ Jane S. Richardson, Co-Supervisor ___________________________ Charles William Carter, Jr. ___________________________ Harold P. Erickson ___________________________ Terrence G. Oas ___________________________ Maria A. Schumacher Dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Biochemistry in the Graduate School of Duke University 2015 ABSTRACT Using C-Alpha Geometry to Describe Protein Secondary Structure and Motifs by Christopher Joseph Williams Department of Biochemistry Duke University Date:_______________________ Approved: ___________________________ David C. Richardson, Co-Supervisor ___________________________ Jane S. Richardson, Co-Supervisor ___________________________ Charles William Carter, Jr. ___________________________ Harold P. Erickson ___________________________ Terrence G. Oas ___________________________ Maria A. Schumacher An abstract of a dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Biochemistry in the Graduate School of Duke University 2015 Copyright © by Christopher Joseph Williams 2015 All rights reserved except the rights granted by the Creative Commons Attribution- Noncommercial License Abstract X-ray crystallography 3D atomic models are used in a variety of research areas to understand and manipulate protein structure. Research and application are dependent on the quality of the models. Low-resolution experimental data is a common problem in crystallography Which makes solving structures and producing the reliable models that many scientists depend on difficult. In this Work, I develop neW, automated tools for validation and correction of loW-resolution structures. These tools are gathered under the name CaBLAM, for C- alpha Based LoW-resolution Annotation Method. CaBLAM uses a unique, Cα geometry- based parameter space to identify outliers in protein backbone geometry, and to identify secondary structure that may be masked by modeling errors. CaBLAM Was developed in the Python programming language as part of the Phenix crystallography suite and the open CCTBX Project. It makes use of architecture and methods available in the CCTBX toolbox. Quality-filtered databases of high- resolution protein structures, especially the Top8000, Were used to construct contours of expected protein behavior for CaBLAM. CaBLAM has also been integrated into the codebase for the Richardson Lab’s online MolProbity validation service. CaBLAM succeeds in providing useful validation feedback for protein structures in the 2.5-4.0Å resolution range. This success demonstrates the relative reliability of the iv Cα trace of a protein in this resolution range. Full mainchain information can be extrapolated from the Cα trace, especially for regular secondary structure elements. CaBLAM has also informed our approach to validation for loW-resolution structures. Moderation of feedback, to reduce validation overload and to focus user attention on modeling errors that are both significant and correctable, is one of our goals. CaBLAM and the related methods that have groWn around it demonstrate the progress toWards this goal. v Dedication Ars gratia artis. Mens gratia mentis. vi Contents Abstract .......................................................................................................................................... iv List of Tables ............................................................................................................................... xvi List of Figures ............................................................................................................................ xvii AcknoWledgements ................................................................................................................. xxiii 1. Introduction ............................................................................................................................ 1 2. CASP Retrospective ............................................................................................................... 7 2.1 The CASP experiment ...................................................................................................... 7 2.2 My CASP tools .................................................................................................................. 8 2.2.1 Completeness ............................................................................................................... 8 2.2.2 Sidechain alignment .................................................................................................... 9 2.3 Lessons from CASP ........................................................................................................ 12 2.3.1 Mainchain reality score ............................................................................................. 12 2.3.2 Adjusted clash cutoff ................................................................................................ 13 2.4 Discussion ........................................................................................................................ 15 3. Challenges of loW-resolution protein modeling ............................................................. 17 3.1 Causes of loW-resolution data ...................................................................................... 17 3.1.1 Resolution ................................................................................................................... 17 3.1.2 Crystal quality ............................................................................................................ 19 3.1.3 Mobility and disorder ............................................................................................... 20 3.1.4 Correlation of Interest and Difficulty ..................................................................... 22 vii 3.2 Data quality ..................................................................................................................... 23 3.2.1 Ambiguous density ................................................................................................... 24 3.2.2 Misleading density .................................................................................................... 25 3.2.3 Missing density .......................................................................................................... 27 3.2.4 Core versus surface density ..................................................................................... 27 3.3 Missing and truncated sidechains ................................................................................ 28 3.4 Loops ................................................................................................................................ 30 3.5 Data versus geometry restraints ................................................................................... 32 3.6 Discussion ........................................................................................................................ 35 4. The Cα geometry parameter spaces .................................................................................. 37 4.1 Problems With all-atom parameter spaces .................................................................. 37 4.1.1 Ramachandran analysis at loW resolution ............................................................. 37 4.1.2 DSSP annotation at loW resolution ......................................................................... 42 4.2 Developing the Cα parameter space ............................................................................ 45 4.2.1 Cα pseudodihedrals .................................................................................................. 47 4.2.2 The peptide plane dihedral ...................................................................................... 49 4.2.3 The Cα virtual angle ................................................................................................. 51 4.3 Representations of the parameter space ...................................................................... 52 4.3.1 3D CaBLAM space .................................................................................................... 53 4.3.2 2D CaBLAM space .................................................................................................... 53 4.3.3 3D Cα geometry space .............................................................................................. 53 4.4 Populating the 3D parameter spaces ........................................................................... 54 viii 4.4.1 Dataset selection ........................................................................................................ 54 4.4.2 Residue-level quality filtering ................................................................................. 55 4.4.3 3D parameter spaces ................................................................................................. 56 4.5 DSSP letter-code prediction With CaBLAM ............................................................... 62 4.5.1 Dataset selection .......................................................................................................