Universal Architectural Concepts Underlying Protein Folding Patterns Arthur M

Total Page:16

File Type:pdf, Size:1020Kb

Universal Architectural Concepts Underlying Protein Folding Patterns Arthur M bioRxiv preprint doi: https://doi.org/10.1101/480194; this version posted November 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Universal architectural concepts underlying protein folding patterns Arthur M. Leska,b, Ramanan Subramanianc, Lloyd Allisonc, David Abramsond, Peter J. Stuckeyc,e, Maria Garcia de la Bandac, and Arun S. Konagurthuc,* aMRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, U.K.; bDepartment of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA 16802, U.S.A.; cFaculty of Information Technology, Monash University, Clayton, VIC 3800, Australia; dResearch Computing Center, University of Queensland, Brisbane, QLD 4072, Australia; eSchool of Computing and Information Systems, University of Melbourne, VIC 3010, Australia ABSTRACT What is the architectural ‘basis set’ of the observed universe of protein structures? Using information-theoretic inference, we answer this question with a comprehensive dictionary of 1,493 substructural concepts. Each concept represents a topologically-conserved assembly of helices and strands that make contact. Any protein structure can be dissected into instances of concepts from this dictionary. We dissected the world-wide protein data bank and completely inventoried all concept instances. This yields an unprecedented source of biological insights. These include: correlations between concepts and catalytic activities or binding sites, useful for rational drug design; local amino-acid sequence–structure correlations, useful for ab initio structure prediction methods; and information supporting the recognition and exploration of evolutionary relationships, useful for structural studies. An interactive site,P ROÇODIC, at http://lcb.infotech.monash.edu.au/prosodic (click) provides access to and navigation of the entire dictionary of concepts, and all associated information. KEYWORDS: architectural concepts, protein building blocks, structural motifs, minimum message length, mml, lossless compression, information theory INTRODUCTION The polypeptide chains of amino acids (primary structure) in most proteins fold into helices and strands of sheet (secondary structure), which in turn assemble to give proteins their intricate three-dimensional shapes and folding patterns (tertiary structure). Experimental methods have already provided over 140,000 entries in the world-wide Protein Data Bank (wwPDB), containing the three-dimensional coordinates of proteins and protein-nucleic acid complexes from a wide range of species. Unravelling protein architecture and discovering the relationship among these three major levels of structural description provides the key to understanding how proteins function, how their 3D folding patterns form, and how they evolve (1). Investigations of protein folding patterns have revealed recurrent themes at all structural levels (2–8), which form the basis for widely-used hierarchical classifications of protein structures (9–11). Nevertheless, many aspects of the relationships across structural levels have remained unresolved. Chothia and Lesk (6) introduced the idea of a core of the folding patterns of homologous proteins. This core comprises a maximal set of secondary structural elements that assemble in a common 3D topology, while withstanding a certain amount of distortion. The parts outside the core are structurally more variable. Many related proteins contain some but not all of the same common substructures that form their cores. * To whom correspondence should be addressed. E-mail: [email protected] Conceptualization: ASK; Methodology: AML, LA, DA, PJS, MG, and ASK; Software: RS and ASK; Validation: AML, LA, and ASK; Analysis: AML, LA and ASK; Investigation: AML, RS, PJS, MG and ASK; Resources: AML and DA; Data Curation: ASK; Writing - Orginal Draft: AML and ASK; Writing - Review & Editing: RS, LA, DA, PJS and MG; Visualization: AML and ASK; Supervision: ASK; Project Administration: ASK; Funding Acquisition: AML, PJS, MG and ASK. November 25, 2018 | 1–20 bioRxiv preprint doi: https://doi.org/10.1101/480194; this version posted November 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Therefore, it is of crucial interest to discover the nature of the substructures that contribute to the cores of protein families. Some of these are supersecondary structures – small conserved combinations of successive elements of secondary structure, such as the β-α-β subunit. Supersecondary structures recur within many protein folds, and can be shared even by unrelated proteins. For example, the β-α-β subunit appears in NAD-binding domains, in TIM barrels, and in many other proteins. Early definitions of supersecondary structures relied strongly on experts spotting and naming them (4, 12). With the steady growth of the wwPDB, several methods have been developed to identify automatically, with varying operational definitions, a library of substructures that form what can be considered as the 3D building blocks of protein structures (8, 13–25). However, these approaches yielded limited libraries containing mostly short oligopeptide fragments or assemblies of typically 2 to 4 secondary structural elements. It has been a challenge so far to go further than that and dissect protein structures into a more complete set that includes larger conserved substructures. Apart from the enormous computational challenge this problem poses, the attempts made so far lacked a statistically-rigorous framework in which to describe, compute, identify and resolve a dictionary of conserved assemblies of secondary structures. Here we address this problem and present a universal dictionary of substructural concepts, PROÇODIC, that advances the current knowledge of these conserved patterns. Our approach relies on the rigorous information-theoretic framework of Minimum Message Length inference that allows the inference of a dictionary that (a) avoids overfitting (i.e., inferring a dictionary that is more complex than necessary to explain the observed folding patterns) and (b) achieves an objective trade-off between the descriptive complexity of concepts in the dictionary and their fidelity (i.e., the amount of compression) gained when explaining the observed protein folding patterns. Thus, this work presents the ‘basis set’ of concepts underlying all observed protein folding patterns. PROÇODIC can contribute to: understanding fundamental principles of protein structure, correlations of concepts with ligand binding sites to suggest function, and application of sequence conservation within concepts for protein structure prediction. RESULTS Automatic identification of a dictionary of substructural concepts. This work uses the concise tableau representation of protein folding patterns introduced by Lesk (26), which is based on the idea that the essence of a protein folding pattern is captured by the order, contacts and geometry of the assembly of secondary structural elements along the amino-acid chain. A tableau corresponds to the 3D structure of a single protein domain (or sometimes chain), and has the form of a symmetric matrix (Fig. 1(a,c); Supplementary §S1). Importantly, in this representation supersecondary structures find compact and computable definitions as subtableaux containing two or more successive secondary structure elements in contact (Fig.1(d-e)). We constructed the universal dictionary reported here using our recently-developed method to infer, automatically, conserved assemblies of secondary structural elements within any given source collection of tableaux (27). The idea of a concept is constrained by the requirement that every secondary structural element in the concept must be in contact with at least one other secondary-structure element in that concept. Our concept inference approach (27) is based on the powerful minimum message length criterion for statistical inductive inference (28–30) and lossless data compression (Supplementary §S2). We applied this method to compress the source collection of Astral SCOP domains (9, 10, 31)(Supplementary §S1). This allowed us to infer a dictionary of 1,493 substructural concepts that most concisely and losslessly describes the entire source collection, and does so without any prior knowledge or preconceived notions of 2 bioRxiv preprint doi: https://doi.org/10.1101/480194; this version posted November 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. these recurrent substructures. The total computational effort to identify this dictionary is equivalent to about 7 years of runtime on a modern computer. We parallelised our method and ran it on a high-performance computing cluster (Supplementary §S2). Fig. 1. (a) Secondary-structural cartoon representation of the crystal structure of the Actin-binding protein actophorin from Acanthamoeba (1AHQ)(32). (b) Secondary structural assignment (using SST (33); H = helix, E = strand of sheet) and the optimal dissection of the protein chain
Recommended publications
  • Protein Structure: Data Bases and Classification
    Protein Structure: Data Bases and Classification Ingo Ruczinski Department of Biostatistics, Johns Hopkins University Reference Bourne and Weissig Structural Bioinformatics Wiley, 2003 More References 1 Structural Proteins Membrane Proteins Globular Proteins 2 Terminology • Primary Structure • Secondary Structure • Tertiary Structure • Quatenary Structure • Supersecondary Structure • Domain • Fold Hierarchy of Protein Structure Helices α 3.10 π Amino acids/turn: 3.6 3.0 4.4 Frequency ~97% ~3% rare H-bonding i, i+4 i, i+3 i, i+5 3 α-helices α-helices α-helices have handedness: α-helices have a dipole: β-sheets 4 β-sheets Have a right-handed twist! β-sheets Can form higher level structures! Super Secondary Structure Motifs 5 What is a Domain? Richardson (1981): W ithin a single subunit [polypeptide chain], contiguous portions of the polypeptide chain frequently fold into compact, local semi-independent units called domains. More About Domains • Independent folding units. • Lots of within contacts, few outside. • Domains create their own hydrophobic core. • Regions usually conserved during recombination. • Different domains of the same protein can have different functions. • Domains of the same protein may or may not interact. Why Look for Domains? Domains are the currency of protein function! 6 Domain Size • Domains can be between 25 and 500 residues long. • Most are less than 200 residues. • Domains can be smaller than 50 residues, but these need to be stabilized. Examples are the zinc finger and a scorpion toxin. Two Very Small Domains A Humdinger of a Domain 7 What’s the Domain? (Part 1) What’s the Domain? (Part 2) Homology and Analogy • Homology: Similarity in characteristics resulting from shared ancestry.
    [Show full text]
  • Evolution of Biomolecular Structure Class II Trna-Synthetases and Trna
    University of Illinois at Urbana-Champaign Luthey-Schulten Group Theoretical and Computational Biophysics Group Evolution of Biomolecular Structure Class II tRNA-Synthetases and tRNA MultiSeq Developers: Prof. Zan Luthey-Schulten Elijah Roberts Patrick O’Donoghue John Eargle Anurag Sethi Dan Wright Brijeet Dhaliwal September 25, 2006. A current version of this tutorial is available at http://www.scs.uiuc.edu/˜schulten/tutorials/evolution/ CONTENTS 2 Contents 1 Introduction 4 1.1 The MultiSeq Bioinformatic Analysis Environment . 4 1.2 Aminoacyl-tRNA Synthetases: Role in translation . 4 1.3 Getting Started . 7 1.3.1 Requirements . 7 1.3.2 Copying the tutorial files . 7 1.3.3 Configuring MultiSeq . 7 1.3.4 Configuring BLAST for MultiSeq . 10 1.4 The Aspartyl-tRNA Synthetase/tRNA Complex . 12 1.4.1 Loading the structure into MultiSeq . 12 1.4.2 Selecting and highlighting residues . 13 1.4.3 Domain organization of the synthetase . 14 1.4.4 Nearest neighbor contacts . 14 2 Evolutionary Analysis of AARS Structures 17 2.1 Loading Molecules . 17 2.2 Multiple Structure Alignments . 18 2.3 Structural Conservation Measure: Qres . 19 2.4 Structure Based Phylogenetic Analysis . 21 2.4.1 Limitations of sequence data . 21 2.4.2 Structural metrics look further back in time . 23 3 Complete Evolutionary Profile of AspRS 26 3.1 BLASTing Sequence Databases . 26 3.1.1 Importing the archaeal sequences . 26 3.1.2 Now the other domains of life . 27 3.2 Organizing Your Data . 28 3.3 Finding a Structural Domain in a Sequence . 29 3.4 Aligning to a Structural Profile using ClustalW .
    [Show full text]
  • The Architecture of the Protein Domain Universe
    The architecture of the protein domain universe Nikolay V. Dokholyan Department of Biochemistry and Biophysics, The University of North Carolina at Chapel Hill, School of Medicine, Chapel Hill, NC 27599 ABSTRACT Understanding the design of the universe of protein structures may provide insights into protein evolution. We study the architecture of the protein domain universe, which has been found to poses peculiar scale-free properties (Dokholyan et al., Proc. Natl. Acad. Sci. USA 99: 14132-14136 (2002)). We examine the origin of these scale-free properties of the graph of protein domain structures (PDUG) and determine that that the PDUG is not modular, i.e. it does not consist of modules with uniform properties. Instead, we find the PDUG to be self-similar at all scales. We further characterize the PDUG architecture by studying the properties of the hub nodes that are responsible for the scale-free connectivity of the PDUG. We introduce a measure of the betweenness centrality of protein domains in the PDUG and find a power-law distribution of the betweenness centrality values. The scale-free distribution of hubs in the protein universe suggests that a set of specific statistical mechanics models, such as the self-organized criticality model, can potentially identify the principal driving forces of molecular evolution. We also find a gatekeeper protein domain, removal of which partitions the largest cluster into two large sub- clusters. We suggest that the loss of such gatekeeper protein domains in the course of evolution is responsible for the creation of new fold families. INTRODUCTION The principles of molecular evolution remain elusive despite fundamental breakthroughs on the theoretical front 1-5 and a growing amount of genomic and proteomic data, over 23,000 solved protein structures 6 and protein functional annotations 7-9.
    [Show full text]
  • Documentation and Localization of Force-Mediated Filamin a Domain
    ARTICLE Received 29 May 2014 | Accepted 10 Jul 2014 | Published 14 Aug 2014 DOI: 10.1038/ncomms5656 Documentation and localization of force-mediated filamin A domain perturbations in moving cells Fumihiko Nakamura1, Mia Song1, John H. Hartwig1 & Thomas P. Stossel1 Endogenously and externally generated mechanical forces influence diverse cellular activities, a phenomenon defined as mechanotransduction. Deformation of protein domains by application of stress, previously documented to alter macromolecular interactions in vitro, could mediate these effects. We engineered a photon-emitting system responsive to unfolding of two repeat domains of the actin filament (F-actin) crosslinker protein filamin A (FLNA) that binds multiple partners involved in cell signalling reactions and validated the system using F-actin networks subjected to myosin-based contraction. Expressed in cultured cells, the sensor-containing FLNA construct reproducibly reported FLNA domain unfolding strikingly localized to dynamic, actively protruding, leading cell edges. The unfolding signal depends upon coherence of F-actin-FLNA networks and is enhanced by stimulating cell contractility. The results establish protein domain distortion as a bona fide mechanism for mechanotransduction in vivo. 1 Translational Medicine Division, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts 02445, USA. Correspondence and requests for materials should be addressed to F.N. (email: [email protected]). NATURE COMMUNICATIONS | 5:4656 | DOI: 10.1038/ncomms5656
    [Show full text]
  • Lipid-Targeting Pleckstrin Homology Domain Turns Its Autoinhibitory Face Toward the TEC Kinases
    Lipid-targeting pleckstrin homology domain turns its autoinhibitory face toward the TEC kinases Neha Amatyaa, Thomas E. Walesb, Annie Kwonc, Wayland Yeungc, Raji E. Josepha, D. Bruce Fultona, Natarajan Kannanc, John R. Engenb, and Amy H. Andreottia,1 aRoy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50011; bDepartment of Chemistry and Chemical Biology, Northeastern University, Boston, MA 02115; and cInstitute of Bioinformatics and Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA 30602 Edited by Natalie G. Ahn, University of Colorado Boulder, Boulder, CO, and approved September 17, 2019 (received for review May 3, 2019) The pleckstrin homology (PH) domain is well known for its phos- activation loop phosphorylation site are also controlled by noncatalytic pholipid targeting function. The PH-TEC homology (PHTH) domain domains (17). In addition to the N-terminal PHTH domain, the within the TEC family of tyrosine kinases is also a crucial component TEC kinases contain a proline-rich region (PRR) and Src ho- of the autoinhibitory apparatus. The autoinhibitory surface on the mology 3 (SH3) and Src homology 2 (SH2) domains that impinge PHTH domain has been previously defined, and biochemical investi- on the kinase domain to alter the conformational ensemble and gations have shown that PHTH-mediated inhibition is mutually thus the activation status of the enzyme. A crystal structure of the exclusive with phosphatidylinositol binding. Here we use hydrogen/ BTK SH3-SH2-kinase fragment has been solved (10) showing that deuterium exchange mass spectrometry, nuclear magnetic resonance the SH3 and SH2 domains of BTK assemble onto the distal side of (NMR), and evolutionary sequence comparisons to map where and the kinase domain (the surface opposite the activation loop), how the PHTH domain affects the Bruton’s tyrosine kinase (BTK) stabilizing the autoinhibited form of the kinase in a manner similar domain.
    [Show full text]
  • Water in Protein Structure Prediction
    Water in protein structure prediction Garegin A. Papoian†‡, Johan Ulander†‡§, Michael P. Eastwood†¶, Zaida Luthey-Schultenʈ, and Peter G. Wolynes†,†† †Department of Chemistry and Biochemistry and Center for Theoretical Biological Physics, University of California at San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0371; and ʈDepartment of Chemistry, University of Illinois at Urbana–Champaign, Urbana, IL 61801 Contributed by Peter G. Wolynes, December 4, 2003 Proteins have evolved to use water to help guide folding. A interactions play an important role not only in binding interfaces physically motivated, nonpairwise-additive model of water-medi- but in folding of monomeric proteins. ated interactions added to a protein structure prediction Hamilto- We use the associative memory (AM) Hamiltonian molecular nian yields marked improvement in the quality of structure pre- dynamics model as a starting point (14–16). This Hamiltonian diction for larger proteins. Free energy profile analysis suggests has two principal components: general polymer physics-based that long-range water-mediated potentials guide folding and terms that are sequence independent, collectively called ‘‘back- smooth the underlying folding funnel. Analyzing simulation tra- bone,’’ and sequence-dependent knowledge-based distance- jectories gives direct evidence that water-mediated interactions dependent additive potentials, collectively denoted as AM͞C facilitate native-like packing of supersecondary structural ele- (AM͞contact). The AM part describes interactions between all ments. Long-range pairing of hydrophilic groups is an integral part pairs of residues that are separated in sequence between 3 and of protein architecture. Specific water-mediated interactions are a 12 residues. It uses a set of nonhomologous memory proteins to universal feature of biomolecular recognition landscapes in both build a funneled energy landscape by matching fragments.
    [Show full text]
  • Pleckstrin Homology Domains and the Cytoskeleton
    FEBS 25627 FEBS Letters 513 (2002) 71^76 View metadata, citation and similar papers at core.ac.uk brought to you by CORE Minireview provided by Elsevier - Publisher Connector Pleckstrin homology domains and the cytoskeleton Mark A. Lemmona;Ã, Kathryn M. Fergusona, Charles S. Abramsb;Ã aDepartment of Biochemistry and Biophysics, University of Pennsylvania School of Medicine, 809C Stellar-Chance Laboratories, 422 Curie Boulevard, Philadelphia, PA 19104-6059, USA bDepartment of Medicine, University of Pennsylvania School of Medicine, 912 BRB II/III, 421 Curie Boulevard, Philadelphia, PA 19104-6160, USA Received 20 October 2001; revised 30 October 2001; accepted 14 November 2001 First published online 6 December 2001 Edited by Gianni Cesareni and Mario Gimona some 27 di¡erent proteins contain a total of 36 PH domains, Abstract Pleckstrin homology (PH) domains are 100^120 amino acid protein modules best known for their ability to bind making the PH domain the 17th most common yeast domain phosphoinositides. All possess an identical core L-sandwich fold [6]. The sequence characteristics used to identify PH domains and display marked electrostatic sidedness. The binding site for appear to de¢ne a particular protein fold that has now been phosphoinositides lies in the center of the positively charged face. seen in the X-ray crystal structures and/or nuclear magnetic In some cases this binding site is well defined, allowing highly resonance (NMR) structures of some 13 di¡erent PH domains specific and strong ligand binding. In several of these cases the [7^12]. Each of these PH domains possesses an almost iden- PH domains specifically recognize 3-phosphorylated phospho- tical core L-sandwich structure (described below), despite pair- inositides, allowing them to drive membrane recruitment in wise sequence identities between PH domains that range from response to phosphatidylinositol 3-kinase activation.
    [Show full text]
  • A New Census of Protein Tandem Repeats and Their Relationship with Intrinsic Disorder
    G C A T T A C G G C A T genes Article A New Census of Protein Tandem Repeats and Their Relationship with Intrinsic Disorder Matteo Delucchi 1,2 , Elke Schaper 1,2,† , Oxana Sachenkova 3,‡, Arne Elofsson 3 and Maria Anisimova 1,2,* 1 ZHAW Life Sciences und Facility Management, Applied Computational Genomics, 8820 Wädenswil, Switzerland; [email protected] 2 Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland 3 Science of Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, 106 91 Stockholm, Sweden * Correspondence: [email protected]; Tel.: +41-(0)58-934-5882 † Present address: Carbon Delta AG, 8002 Zürich, Switzerland. ‡ Present address: Vildly AB, 385 31 Kalmar, Sweden. Received: 9 March 2020; Accepted: 1 April 2020; Published: 9 April 2020 Abstract: Protein tandem repeats (TRs) are often associated with immunity-related functions and diseases. Since that last census of protein TRs in 1999, the number of curated proteins increased more than seven-fold and new TR prediction methods were published. TRs appear to be enriched with intrinsic disorder and vice versa. The significance and the biological reasons for this association are unknown. Here, we characterize protein TRs across all kingdoms of life and their overlap with intrinsic disorder in unprecedented detail. Using state-of-the-art prediction methods, we estimate that 50.9% of proteins contain at least one TR, often located at the sequence flanks. Positive linear correlation between the proportion of TRs and the protein length was observed universally, with Eukaryotes in general having more TRs, but when the difference in length is taken into account the difference is quite small.
    [Show full text]
  • Chapter 6 Protein Structure and Folding
    Chapter 6 Protein Structure and Folding 1. Secondary Structure 2. Tertiary Structure 3. Quaternary Structure and Symmetry 4. Protein Stability 5. Protein Folding Myoglobin Introduction 1. Proteins were long thought to be colloids of random structure 2. 1934, crystal of pepsin in X-ray beam produces discrete diffraction pattern -> atoms are ordered 3. 1958 first X-ray structure solved, sperm whale myoglobin, no structural regularity observed 4. Today, approx 50’000 structures solved => remarkable degree of structural regularity observed Hierarchy of Structural Layers 1. Primary structure: amino acid sequence 2. Secondary structure: local arrangement of peptide backbone 3. Tertiary structure: three dimensional arrangement of all atoms, peptide backbone and amino acid side chains 4. Quaternary structure: spatial arrangement of subunits 1) Secondary Structure A) The planar peptide group limits polypeptide conformations The peptide group ha a rigid, planar structure as a consequence of resonance interactions that give the peptide bond ~40% double bond character The trans peptide group The peptide group assumes the trans conformation 8 kJ/mol mire stable than cis Except Pro, followed by cis in 10% Torsion angles between peptide groups describe polypeptide chain conformations The backbone is a chain of planar peptide groups The conformation of the backbone can be described by the torsion angles (dihedral angles, rotation angles) around the Cα-N (Φ) and the Cα-C bond (Ψ) Defined as 180° when extended (as shown) + = clockwise, seen from Cα Not
    [Show full text]
  • A Deep Learning Approach for Protein-Ligand Interaction Prediction
    bioRxiv preprint doi: https://doi.org/10.1101/2019.12.20.884841; this version posted September 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. SSnet: A Deep Learning Approach for Protein-Ligand Interaction Prediction Niraj Verma,y,x Xingming Qu,z,x Francesco Trozzi,y,x Mohamed Elsaied,{,x Nischal Karki,y,x Yunwen Tao,y,x Brian Zoltowski,y,x Eric C. Larson,z,x and Elfi Kraka∗,y,x yDepartment of Chemistry, Southern Methodist University, Dallas TX USA zDepartment of Computer Science, Southern Methodist University, Dallas TX USA {Department of Engineering Management and Information System, Southern Methodist University, Dallas TX USA xSouthern Methodist University E-mail: [email protected] Abstract Computational prediction of Protein-Ligand Interaction (PLI) is an important step in the modern drug discovery pipeline as it mitigates the cost, time, and resources re- quired to screen novel therapeutics. Deep Neural Networks (DNN) have recently shown excellent performance in PLI prediction. However, the performance is highly dependent on protein and ligand features utilized for the DNN model. Moreover, in current mod- els, the deciphering of how protein features determine the underlying principles that govern PLI is not trivial. In this work, we developed a DNN framework named SSnet that utilizes secondary structure information of proteins extracted as the curvature and torsion of the protein backbone to predict PLI. We demonstrate the performance of SSnet by comparing against a variety of currently popular machine and non-machine learning models using various metrics.
    [Show full text]
  • Folding-TIM Barrel
    Protein Folding Practical September 2011 Folding up the TIM barrel Preliminary Examine the parallel beta barrel that you constructed, noting the stagger of the strands that was needed to connect the ends of the 8-stranded parallel beta sheet into the 8-stranded beta barrel. Notice that the stagger dictates which side of the sheet is on the inside and which is on the outside. This will be key information in folding the complete TIM linear peptide into the TIM barrel. Assembling the full linear peptide 1. Make sure the white beta strands are extended correctly, and the 8 yellow helices (with the green loops at each end) are correctly folded into an alpha helix (right handed with H-bonds to the 4th ahead in the chain). 2. starting with a beta strand connect an alpha helix and green loop to make the blue-red connecting peptide bond. Making sure that you connect the carbonyl (red) end of the beta strand to the amino (blue) end of the loop-helix-loop. Secure the just connected peptide bond bond with a twist-tie as shown. 3. complete step 2 for all beta strand/loop-helix-loop pairs, working in parallel with your partners 4. As pairs are completed attach the carboxy end of the strand- loop-helix-loop to the amino end of the next strand-loop-helix-loop module and secure the new peptide bond with a twist-tie as before. Repeat until the full linear TIM polypeptide chain is assembled. Make sure all strands and helices are still in the correct conformations.
    [Show full text]
  • Supplemental Table 7. Every Significant Association
    Supplemental Table 7. Every significant association between an individual covariate and functional group (assigned to the KO level) as determined by CPGLM regression analysis. Variable Unit RelationshipLabel See also CBCL Aggressive Behavior K05914 + CBCL Emotionally Reactive K05914 + CBCL Externalizing Behavior K05914 + K15665 K15658 CBCL Total K05914 + K15660 K16130 KO: E1.13.12.7; photinus-luciferin 4-monooxygenase (ATP-hydrolysing) [EC:1.13.12.7] :: PFAMS: AMP-binding enzyme; CBQ Inhibitory Control K05914 - K12239 K16120 Condensation domain; Methyltransferase domain; Thioesterase domain; AMP-binding enzyme C-terminal domain LEC Family Separation/Social Services K05914 + K16129 K16416 LEC Poverty Related Events K05914 + K16124 LEC Total K05914 + LEC Turmoil K05914 + CBCL Aggressive Behavior K15665 + CBCL Anxious Depressed K15665 + CBCL Emotionally Reactive K15665 + K05914 K15658 CBCL Externalizing Behavior K15665 + K15660 K16130 KO: K15665, ppsB, fenD; fengycin family lipopeptide synthetase B :: PFAMS: Condensation domain; AMP-binding enzyme; CBCL Total K15665 + K12239 K16120 Phosphopantetheine attachment site; AMP-binding enzyme C-terminal domain; Transferase family CBQ Inhibitory Control K15665 - K16129 K16416 LEC Poverty Related Events K15665 + K16124 LEC Total K15665 + LEC Turmoil K15665 + CBCL Aggressive Behavior K11903 + CBCL Anxiety Problems K11903 + CBCL Anxious Depressed K11903 + CBCL Depressive Problems K11903 + LEC Turmoil K11903 + MODS: Type VI secretion system K01220 K01058 CBCL Anxiety Problems K11906 + CBCL Depressive
    [Show full text]