Signature of Author

Total Page:16

File Type:pdf, Size:1020Kb

Signature of Author An Analysis of Representations for Protein Structure Prediction by Barbara K. Moore Bryant S.B., S.M., Massachusetts Institute of Technology (1986) Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY September 1994 ( Massachusetts Institute of Technology 1994 SignatureofAuthor ............. ,...... ........-.. .. Department of Electrical Engineering and Computer Science September 2, 1994 Certified by... ..... -- Patrick. H. Winston Professor, Electrical Engineering and Computer Science This Supervisor %-"ULfrtifioAvL.Lv--k &y ho, . Tomas o) no-Perez and Computer Science L! Thp.i. .,nPrvi.qnr Accepted by .. ... , a. ,- · F' drY................... / Frederic~) R. Morgenthaler Departmental /Committee on Graduate Students Iv:r tfF:r _ ;: ! --.. t.8r'(itr i-7etFr An Analysis of Representations for Protein Structure Prediction by Barbara K. Moore Bryant Submitted to the Department of Electrical Engineering and Computer Science on September 2, 1994, in partial fulfillment of the requirements for the degree of Doctor of Philosophy Abstract Protein structure prediction is a grand challenge in' the fields of biology and computer science. Being able to quickly determine the structure of a protein from its amino acid sequence would be extremely useful to biologists interested in elucidating the mechanisms of life and finding ways to cure disease. In spite of a wealth of knowl- edge about proteins and their structure, the structure prediction problem has gone unsolved in the nearly forty years since the first determination of a protein structure by X-ray crystallography. In this thesis, I discuss issues in the representation of protein structure and se- quence for algorithms which perform structure prediction. There is a tradeoff between the complexity of the representation and the accuracy to which we can determine the empirical parameters of the prediction algorithms. I am concerned here with method- ologies to help determine how to make these tradeoffs. In the course of my exploration of several particular representation schemes, I find that there is a very strong correlation between amino acid type and the degree to which residues are exposed to the solvent that surrounds the protein. In addition to confirming current models of protein folding, this results suggests that solvent exposure should be an element of protein structure representation. Thesis Supervisor: Patrick H. Winston Title: Professor, Electrical Engineering and Computer Science Thesis Supervisor: Tomas Lozano-Perez Title: Professor, Electrical Engineering and Computer Science 2 Acknowledgments F'or their support, I thank my thesis committee, the members of the Artificial Intelli- gence Laboratory, Jonathan King's group, the biology reading group, Temple Smith's core modeling group at Boston University, NIH, my friends and family. The following people are some of those who have provided support and encour- agement in many ways. To those who I've forgotten to list, my apologies, and thanks. Bill Bryant, Ljubomir Buturovic, John Canny, Peggy Carney, Kevin Cunningham, Sudeshna Das, Cass Downing-Bryant, Rich Ferrante, David Gifford, Lydia Gregoret, Eric Grimson, Nomi Harris, Roger Kautz, Jonathan King, Ross King, Tom Knight, Kimberle Koile, Rick Lathrop, Tomas Lozano-P6rez, Eric Martin, Mark Matthews, Michael de la Maza, Bettina McGimsey, Scott Minneman, Doug Moore, Jan Moore, Stan Moore, Ilya Muchnik, Raman Nambudripad, Sundar Narasimhan, Carl Pabo, Marilyn Pierce, Geoff Pingree, Tomaso Poggio, Ken Rice, Karen Sarachik, Richard Shapiro, Laurel Simmons, Temple Smith, D. Srikar Rao, Susan Symington, James Tate, Bruce Tidor, Bruce Tidor, Lisa Tucker-Kellogg, Paul Viola, Bruce Walton, Teresa Webster, Jeremy Wertheimer, Jim White, Patrick Winston, and Tau-Mu Yi. 3 Contents 1 Overview 15 1.1 Why predict protein structures? .............. .. 15 1.2 Background ......................... 20 1.2.1 Secondary structure prediction . 23 1.2.2 Tertiary structure prediction . 25 1.3 My work . 30 1.3.1 Hydrophobic collapse vs. structure nucleation and propagation 31 1.3.2 Modeling hydrophobic collapse. 32 1.3.3 Pairwise interactions: great expectations . .. 33 1.3.4 Amphipathicity ................... 34 1.4 Outline of thesis . 34 2 Background 35 2.1 Knowledge representation. 35 2.1.1 Single residue attributes ................... 36 2.1.2 Attributes of residue pairs .................. 39 2.2 Contingency table analysis ..... .............. 39 2.2.1 Contingency tables. 40 2.2.2 Questions asked in contingency table analysis ....... 40 2.2.3 Models of data ........................ 41 2.2.4 A simple contingency table example ............. 42 2.2.5 Loglinear models ....................... 47 2.2.6 Margin counts . 48 2.2.7 Computing model parameters. 49 2.2.8 Examining conditional independence with loglinear models 49 2.2.9 Comparing association strengths of variables ........ 49 2.2.10 Examples of questions .................... 52 3 Single-Residue Statistics 54 3.1 Introduction. 54 3.2 Method .............. ............. 54 3.2.1 Data . ............. 54 3.2.2 Residue attributes . ............. 55 3.2.3 Contingency table analysis ............. 55 3.3 Results and Discussion ...... ............. 56 4 3.3.1 Loglinear models. ..... 56 3.3.2 Model hierarchies. ..... 59 3.3.3 Nonspecific vs. specific amino acid representation . ..... 65 3.4 Conclusions ......................... .... 71 3.4.1 Solvent exposure is of primary importance .... ..... 71 3.4.2 Grouping residues by hydrophobicity class .... ..... 71 3.4.3 Attribute preferences .............. ...... 72 4 Paired-Residue Statistics 73 4.1 Introduction. 73 4.2 Methods. 74 4.2.1 Data. 74 4.2.2 Representational attributes .............. 74 4.3 Results and Discussion ..................... 76 4.3.1 Non-contacting pairs. 76 4.3.2 Contacting pairs; residues grouped into three classes . 81 4.3.3 Contacting residues: all twenty amino acids ...... 85 4.4 Conclusions ........................... 87 5 Distinguishing Parallel from Antiparallel 89 5.1 Summary . 89 5.2 Methods ............................. 91 5.2.1 Protein data ....................... 91 5.2.2 Definition of secondary structure, and topological relationships. 91 5.2.3 Counting and statistics ..................... 92 5.2.4 Contingency table analysis .................... 94 5.3 Results and Discussion .......................... 94 5.3.1 Amino acid compositions of parallel and antiparallel beta struc- ture. 94 5.3.2 Solvent exposure ........ .101 5.3.3 Position in sheet ........ 105 5.3.4 Grouping residues into classes . 106 5.3.5 Contingency Table Analysis . 106 5.4 Implications ............... ... ... 111 5.4.1 Protein folding and structure . ...... ... .... 111 5.4.2 Secondary structure prediction . ... ... 111 5.4.3 Tertiary structure prediction . 112 6 Pairwise Interactions in Beta Sheets 113 6.1 Introduction. 113 6.2 Method. 114 6.2.1 Significance testing ....... 115 6.3 Results and Discussion ......... 115 6.3.1 Counts and preferences ..... 115 6.3.2 Significant specific "recognition" of beta pairs . .. 128 6.3.3 Significant nonspecific recognition ................ 128 6.3.4 Solvent exposure ........... 129 6.3.5 Five-dimensional contingency table . 131 6.3.6 Nonrandom association of exposure ............... 131 6.3.7 Unexpected correlation in buried (i,j + 1) pairs. 135 6.4 Conclusions.......... 139 7 Secondary Structure Prediction 141 7.1 Introduction. 141 7.2 Related Work ............................... 142 7.2.1 Hydrophobicity patterns in other secondary structure predic- tion methods .................. 142 7.2.2 Neural Nets for Predicting Secondary Structure 144 7.2.3 Periodic features in sequences ......... 146 7.3 A representation for hydrophobicity patterns ..... 147 7.3.1 I, and I: maximum hydrophobic moments 148 7.3.2 Characterization of I, and I, ......... 150 7.3.3 De,cision rule based on I, and Io ....... 152 7.4 Method ............ 153 . :. :. 7.4.1 Neural network .... 157 7.4.2 Data .......... 159 7.4.3 Output representation 159 7.4.4 Input representation . 161 7.4.5 Network performance 162 7.4.6 Significancetests . 165 7.5 Results. ............ ............. 166 7.5.1 Performance. ............. 166 7.5.2 Amphipathicity .... ............. 166 7.5.3 Amino acid encoding ............. 168 7.5.4 Learning Curves .... ............. 169 7.5.5 ROC curves ...... ............. 169 7.5.6 Weights ........ ............. 173 7.6 Conclusion. .......... ............. 180 8 Threading 181 8.1 Introduction. 182 8.1.1 Threading algorithm. 182 8.1.2 Pseudopotentials for threading ...... 188 8.1.3 Sample size problems ........... 189 8.1.4 Structure representations. 192 8.1.5 Incorporating local sequence information 193 8.2 Method ...................... 194 8.2.1 Data. 194 8.2.2 Threading code .............. 194 8.2.3 Structure representations ......... 194 6 S.2.4 Amino acid counts ........................ 196 $.2.5 Scores for threading; padding .................. 196 8.2.6 Incorporating local sequence information. 197 8.3 Results and Discussion ........................ 197 $.3.1 Counts. 197 S.3.2 Likelihood ratios ........... 197 8.3.3 Comparison of singleton pseudopotential components 197 8.3.4 Incorporating sequence window information .......... 205 S.3.5 Splitting the beta structure representation into parallel and an- tiparallel. 207 8.4 Conclusions. 210 9 Work in Progress 212 9.1 (Computer representations of proteins. 212 9.1.1 Automation. 212
Recommended publications
  • Meeting Review: Bioinformatics and Medicine – from Molecules To
    Comparative and Functional Genomics Comp Funct Genom 2002; 3: 270–276. Published online 9 May 2002 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cfg.178 Feature Meeting Review: Bioinformatics And Medicine – From molecules to humans, virtual and real Hinxton Hall Conference Centre, Genome Campus, Hinxton, Cambridge, UK – April 5th–7th Roslin Russell* MRC UK HGMP Resource Centre, Genome Campus, Hinxton, Cambridge CB10 1SB, UK *Correspondence to: Abstract MRC UK HGMP Resource Centre, Genome Campus, The Industrialization Workshop Series aims to promote and discuss integration, automa- Hinxton, Cambridge CB10 1SB, tion, simulation, quality, availability and standards in the high-throughput life sciences. UK. The main issues addressed being the transformation of bioinformatics and bioinformatics- based drug design into a robust discipline in industry, the government, research institutes and academia. The latest workshop emphasized the influence of the post-genomic era on medicine and healthcare with reference to advanced biological systems modeling and simulation, protein structure research, protein-protein interactions, metabolism and physiology. Speakers included Michael Ashburner, Kenneth Buetow, Francois Cambien, Cyrus Chothia, Jean Garnier, Francois Iris, Matthias Mann, Maya Natarajan, Peter Murray-Rust, Richard Mushlin, Barry Robson, David Rubin, Kosta Steliou, John Todd, Janet Thornton, Pim van der Eijk, Michael Vieth and Richard Ward. Copyright # 2002 John Wiley & Sons, Ltd. Received: 22 April 2002 Keywords: bioinformatics;
    [Show full text]
  • Algorithms for Computational Biology 8Th International Conference, Alcob 2021 Missoula, MT, USA, June 7–11, 2021 Proceedings
    Lecture Notes in Bioinformatics 12715 Subseries of Lecture Notes in Computer Science Series Editors Sorin Istrail Brown University, Providence, RI, USA Pavel Pevzner University of California, San Diego, CA, USA Michael Waterman University of Southern California, Los Angeles, CA, USA Editorial Board Members Søren Brunak Technical University of Denmark, Kongens Lyngby, Denmark Mikhail S. Gelfand IITP, Research and Training Center on Bioinformatics, Moscow, Russia Thomas Lengauer Max Planck Institute for Informatics, Saarbrücken, Germany Satoru Miyano University of Tokyo, Tokyo, Japan Eugene Myers Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany Marie-France Sagot Université Lyon 1, Villeurbanne, France David Sankoff University of Ottawa, Ottawa, Canada Ron Shamir Tel Aviv University, Ramat Aviv, Tel Aviv, Israel Terry Speed Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC, Australia Martin Vingron Max Planck Institute for Molecular Genetics, Berlin, Germany W. Eric Wong University of Texas at Dallas, Richardson, TX, USA More information about this subseries at http://www.springer.com/series/5381 Carlos Martín-Vide • Miguel A. Vega-Rodríguez • Travis Wheeler (Eds.) Algorithms for Computational Biology 8th International Conference, AlCoB 2021 Missoula, MT, USA, June 7–11, 2021 Proceedings 123 Editors Carlos Martín-Vide Miguel A. Vega-Rodríguez Rovira i Virgili University University of Extremadura Tarragona, Spain Cáceres, Spain Travis Wheeler University of Montana Missoula, MT, USA ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Bioinformatics ISBN 978-3-030-74431-1 ISBN 978-3-030-74432-8 (eBook) https://doi.org/10.1007/978-3-030-74432-8 LNCS Sublibrary: SL8 – Bioinformatics © Springer Nature Switzerland AG 2021 This work is subject to copyright.
    [Show full text]
  • Microblogging the ISMB: a New Approach to Conference Reporting
    Message from ISCB Microblogging the ISMB: A New Approach to Conference Reporting Neil Saunders1*, Pedro Beltra˜o2, Lars Jensen3, Daniel Jurczak4, Roland Krause5, Michael Kuhn6, Shirley Wu7 1 School of Molecular and Microbial Sciences, University of Queensland, St. Lucia, Brisbane, Queensland, Australia, 2 Department of Cellular and Molecular Pharmacology, University of California San Francisco, San Francisco, California, United States of America, 3 Novo Nordisk Foundation Center for Protein Research, Panum Institute, Copenhagen, Denmark, 4 Department of Bioinformatics, University of Applied Sciences, Hagenberg, Freistadt, Austria, 5 Max-Planck-Institute for Molecular Genetics, Berlin, Germany, 6 European Molecular Biology Laboratory, Heidelberg, Germany, 7 Stanford Medical Informatics, Stanford University, Stanford, California, United States of America Cameron Neylon entitled FriendFeed for Claire Fraser-Liggett opened the meeting Scientists: What, Why, and How? (http:// with a review of metagenomics and an blog.openwetware.org/scienceintheopen/ introduction to the human microbiome 2008/06/12/friendfeed-for-scientists-what- project (http://friendfeed.com/search?q = why-and-how/) for an introduction. room%3Aismb-2008+microbiome+OR+ We—a group of science bloggers, most fraser). The subsequent Q&A session of whom met in person for the first time at covered many of the exciting challenges The International Conference on Intel- ISMB 2008—found FriendFeed a remark- for those working in this field. Clearly, ligent Systems for Molecular Biology
    [Show full text]
  • Complete Vertebrate Mitogenomes Reveal Widespread Gene Duplications and Repeats
    bioRxiv preprint doi: https://doi.org/10.1101/2020.06.30.177956; this version posted July 1, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Complete vertebrate mitogenomes reveal widespread gene duplications and repeats Giulio Formenti+*1,2,3, Arang Rhie4, Jennifer Balacco1, Bettina Haase1, Jacquelyn Mountcastle1, Olivier Fedrigo1, Samara Brown2,3, Marco Capodiferro5, Farooq O. Al-Ajli6,7,8, Roberto Ambrosini9, Peter Houde10, Sergey Koren4, Karen Oliver11, Michelle Smith11, Jason Skelton11, Emma Betteridge11, Jale Dolucan11, Craig Corton11, Iliana Bista11, James Torrance11, Alan Tracey11, Jonathan Wood11, Marcela Uliano-Silva11, Kerstin Howe11, Shane McCarthy12, Sylke Winkler13, Woori Kwak14, Jonas Korlach15, Arkarachai Fungtammasan16, Daniel Fordham17, Vania Costa17, Simon Mayes17, Matteo Chiara18, David S. Horner18, Eugene Myers13, Richard Durbin12, Alessandro Achilli5, Edward L. Braun19, Adam M. Phillippy4, Erich D. Jarvis*1,2,3, and The Vertebrate Genomes Project Consortium +First author *Corresponding authors; [email protected]; [email protected] Author affiliations 1. The Vertebrate Genome Lab, Rockefeller University, New York, NY, USA 2. Laboratory of Neurogenetics of Language, Rockefeller University, New York, NY, USA 3. The Howards Hughes Medical Institute, Chevy Chase, MD, USA 4. Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD USA 5. Department of Biology and Biotechnology “L. Spallanzani”, University of Pavia, Pavia, Italy 6. Monash University Malaysia Genomics Facility, School of Science, Selangor Darul Ehsan, Malaysia 7.
    [Show full text]
  • PROGRAM CHAIR: Eben Rosenthal, MD PROFFERED PAPERS CHAIR: Ellie Maghami, MD POSTER CHAIR: Maie St
    AHNS 10TH INTERNATIONAL CONFERENCE ON HEAD & NECK CANCER “Survivorship through Quality & Innovation” JULY 22-25, 2021 • VIRTUAL CONFERENCE AHNS PRESIDENT: Cherie-Ann Nathan, MD, FACS CONFERENCE/DEVELOPMENT CHAIR: Robert Ferris, MD, PhD PROGRAM CHAIR: Eben Rosenthal, MD PROFFERED PAPERS CHAIR: Ellie Maghami, MD POSTER CHAIR: Maie St. John, MD Visit www.ahns2021.org for more information. WELCOME LETTER Dear Colleagues, The American Head and Neck Society (AHNS) is pleased to invite you to the virtual AHNS 10th International Conference on Head and Neck Cancer, which will be held July 22-25, 2021. The theme is Survivorship through Quality & Innovation and the scientific program has been thoughtfully designed to bring together all disciplines related to the treatment of head and neck cancer. Our assembled group of renowned head and neck surgeons, radiologists and oncologists have identified key areas of interest and major topics for us to explore. The entire conference will be presented LIVE online July 22-25, 2021. We encourage you to attend the live sessions in order to engage with the faculty and your colleagues. After the live meeting, all of the meeting content will be posted on the conference site and remain open for on-demand viewing through October 1, 2021. Attendees may earn up to 42.25 AMA PRA Category 1 Credit(s)TM as well as earn re- quired annual part II self-assessment credit in the American Board of Otolaryngology – Head and Neck Surgery’s Continu- ing Certification program (formerly known as MOC). At the conclusion of the activity,
    [Show full text]
  • Assessing Sequence Comparison Methods with Reliable Structurally Identified Distant Evolutionary Relationships
    Proc. Natl. Acad. Sci. USA Vol. 95, pp. 6073–6078, May 1998 Biochemistry Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships STEVEN E. BRENNER*†‡,CYRUS CHOTHIA*, AND TIM J. P. HUBBARD§ *MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, United Kingdom; and §Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambs CB10 1SA, United Kingdom Communicated by David R. Davies, National Institute of Diabetes, Bethesda, MD, March 16, 1998 (received for review November 12, 1997) ABSTRACT Pairwise sequence comparison methods have Sequence comparison methodologies have evolved rapidly, been assessed using proteins whose relationships are known so no previously published tests has evaluated modern versions reliably from their structures and functions, as described in of programs commonly used. For example, parameters in the SCOP database [Murzin, A. G., Brenner, S. E., Hubbard, T. BLAST (1) have changed, and WU-BLAST2 (2)—which produces & Chothia C. (1995) J. Mol. Biol. 247, 536–540]. The evalua- gapped alignments—has become available. The latest version tion tested the programs BLAST [Altschul, S. F., Gish, W., of FASTA (3) previously tested was 1.6, but the current release Miller, W., Myers, E. W. & Lipman, D. J. (1990). J. Mol. Biol. (version 3.0) provides fundamentally different results in the 215, 403–410], WU-BLAST2 [Altschul, S. F. & Gish, W. (1996) form of statistical scoring. Methods Enzymol. 266, 460–480], FASTA [Pearson, W. R. & The previous reports also have left gaps in our knowledge. Lipman, D. J. (1988) Proc. Natl. Acad. Sci. USA 85, 2444–2448], For example, there has been no published assessment of and SSEARCH [Smith, T.
    [Show full text]
  • A Novel and Effective Scoring Scheme for Structure Classification And
    A novel and effective scoring scheme for structure classification and pairwise similarity measurement Rezaul Karim∗, Md. Momin Al Aziz†,Swakkhar Shatabda‡ and M. Sohel Rahman∗ ∗Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology †Department of Computer Science, University of Manitoba, Canada ‡Department of Computer Science and Engineering, United International University Abstract—Protein tertiary structure defines its functions, clas- However, a major drawback of these approaches with re- sification and binding sites. Similar structural characteristics gards to the global structure similarity comparison is that these between two proteins often lead to the similar characteristics are much sensitive to local structure dissimilarity. Furthermore, thereof. Determining structural similarity accurately in real these approaches need to find an optimal alignment before time is a crucial research issue. In this paper, we present computing the score despite that often there are no remarkable a novel and effective scoring scheme that is dependent on alignment possible. Also finding proper alignment between two novel features extracted from protein alpha carbon distance matrices. Our scoring scheme is inspired from pattern recog- protein structures is computationally expensive if the size of nition and computer vision. Our method is significantly better the protein is large [9]. In spite of these inherent limitations than the current state of the art methods in terms of fam- and drawbacks, there exist a number of such methods in the ily match of pairs of protein structures and other statistical literature and the most notable ones among these are DALI measurements. The effectiveness of our method is tested on [10], [11], CE [12], TM Align [13], [14] and SP Align [15].
    [Show full text]
  • The Johns Hopkins University
    THE JOHNS HOPKINS UNIVERSITY COMMENCEMENT 2019 Conferring of degrees at the close of the 143rd academic year MAY 23, 2019 CONTENTS Order of Candidate Procession ......................................................... 11 Order of Procession .......................................................................... 12 Order of Events ................................................................................. 13 Conferring of Degrees ....................................................................... 14 Honorary Degrees ............................................................................. 17 Academic Garb .................................................................................. 14 Awards ............................................................................................... 16 Honor Societies ................................................................................. 28 Student Honors ................................................................................. 34 Candidates for Degrees ..................................................................... 43 Divisional Ceremonies Information ............................................... 121 Please note that while all degrees are conferred, only doctoral and bachelor’s graduates process across the stage. Taking photos from your seats during the ceremony is allowed, but we request that guests respect one another’s comfort and enjoyment by not standing and blocking other people’s views. Photos of graduates can be purchased from GradImages®: gradimages.com
    [Show full text]
  • Relative Orientation of Close-Packed ,8-Pleated Sheets in Proteins
    Proc. Nati Acad. Sci. USA Vol. 78, No. 7, pp. 4146-4150, July 1981 Biochemistry Relative orientation of close-packed ,8-pleated sheets in proteins (twisted ,3sheets/protein secondary and tertiary structure) CYRUS CHOTHIA*t AND JOEL JANINt *William Ramsay, Ralph Foster and Christopher Ingold Laboratories, University College London, Gower Street, London WC1E 6BT, England; tMedical Research Council Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, England; and *Unit6 de Biochimie Cellulaire, Department de Biochimie et Gen~tique Moleculaire, Institut Pasteur, 28 rue du Docteur Roux, 75724 Paris Cedex 15, France Communicated by Max F. Perutz, March 30, 1981 ABSTRACT When (3-pleated sheets pack face to face in pro- curve from left to right, while those that point down curve from teins, the angle between the strand directions ofthe two (3-sheets right to left. is observed to be near -30°. We propose a simple model for f3- Fig. 2 b and c illustrates some of the effects of twisting two sheet-to-(-sheet packing in concanavalin A, plastocyanin, y-crys- ,(3pleated sheets that are packed together. We show, on the left tallin, superoxide dismutase, prealbumin, and the immunoglobin ofthe figures, two hypothetical untwisted (3-sheets packed with fragment VREI. This model shows how the observed relative ori- their rows of side chains aligned: those in the top (-sheet are entation of two packed (3-sheets is a consequence of (i) the rows parallel to those in the bottom. On the right of the figures we of side chains at the interface being approximately aligned and show what happens when these packed (3-sheets are twisted (ii) the (3-sheet having a right-handed twist.
    [Show full text]
  • Transformer Neural Networks for Protein Prediction Tasks
    bioRxiv preprint doi: https://doi.org/10.1101/2020.06.15.153643; this version posted June 16, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. TRANSFORMING THE LANGUAGE OF LIFE:TRANSFORMER NEURAL NETWORKS FOR PROTEIN PREDICTION TASKS Ananthan Nambiar ∗ Maeve Heflin∗ Department of Bioengineering Department of Computer Science Carl R. Woese Inst. for Genomic Biol. Carl R. Woese Inst. for Genomic Biol. University of Illinois at Urbana-Champaign University of Illinois at Urbana-Champaign Urbana, IL 61801 Urbana, IL 61801 [email protected] Simon Liu∗ Sergei Maslov Department of Computer Science Department Bioengineering Carl R. Woese Inst. for Genomic Biol. Department of Physics University of Illinois at Urbana-Champaign Carl R. Woese Inst. for Genomic Biol. Urbana, IL 61801 University of Illinois at Urbana-Champaign Urbana, IL 61801 Mark Hopkinsy Anna Ritzy Department of Computer Science Department of Biology Reed College Reed College Portland, OR 97202 Portland, OR 97202 June 16, 2020 ABSTRACT The scientific community is rapidly generating protein sequence information, but only a fraction of these proteins can be experimentally characterized. While promising deep learning approaches for protein prediction tasks have emerged, they have computational limitations or are designed to solve a specific task. We present a Transformer neural network that pre-trains task-agnostic sequence representations. This model is fine-tuned to solve two different protein prediction tasks: protein family classification and protein interaction prediction.
    [Show full text]
  • Predicting RNA Binding Proteins Using Discriminative Methods
    Predicting RNA binding proteins using discriminative methods by Shweta Bhandare M.S. Department of Interdisciplinary Telecommunications, University of Colorado at Boulder 2003 B.E., College of Engineering, University of Goa, 1995 A thesis submitted to the Faculty of the Graduate School of the University of Colorado in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Computer Science 2017 This thesis entitled: Predicting RNA binding proteins using discriminative methods written by Shweta Bhandare has been approved for the Department of Computer Science Prof. Robin Dowell Dr. Debra S. Goldberg Prof. Larry E. Hunter Dr. Daniel Weaver Date The final copy of this thesis has been examined by the signatories, and we find that both the content and the form meet acceptable presentation standards of scholarly work in the above mentioned discipline. iii Bhandare, Shweta (Ph.D., Computer Science) Predicting RNA binding proteins using discriminative methods Thesis directed by Prof. Robin Dowell and Dr. Debra S. Goldberg This thesis examines the role of computational methods in identifying the motifs utilized by RNA-binding proteins (RBPs). RBPs play an important role in post-transcriptional regulation and identify their targets in a highly specific fashion through recognition of primary sequence and/or secondary structure hence making the prediction a complex problem. I applied the existing k-spectrum kernel method to a support vector machine and verified the published binding sites of two RBPs: Human antigen R (HuR) and Tristetraprolin (TTP). These RBPs exhibit opposing effects to the bound messenger RNA (mRNA) transcript but have simi- lar binding preferences.
    [Show full text]
  • Wolfson Review Wolfson The
    2012 – 2013 2013 No.37 – 2012 The Wolfson Review Wolfson The THE Wolfson Review 2012 – 2013 2013 No.37 – 2012 Wolfson College Barton Road Cambridge CB3 9BB www.wolfson.cam.ac.uk Upon 50 Years by John McClenahen (1986), Press Fellow The College is stone and mortar, and wood and glass. The College is ideas, great and small. The College is books and the Internet. Published in 2013 by Wolfson College, Cambridge The College is gates, gardens, paths, Barton Road, Cambridge CB3 9BB courts, and plaques, and a sundial. © Wolfson College, 2013 The College is Lee Library, the Dining Hall, and Bredon House. The College is students, and tutors, and Fellows. The College is fellowship, principles, and ritual. And the College, this College, Wolfson College, Cambridge, is much more. For this remarkable College is a diverse universe, ever expanding. From this College, in their diversity, those who study, guide, and reside here seek knowledge and truth in myriad ways. From this College, this special place, those who study, guide, and reside here seek to create, to find, to explore, to challenge, and to validate. Now and forever may their efforts – all our efforts wherever we are – Cover photograph Coloured primary hypothalamic neuronal culture, labelled ring true to the diversity and distinguishing humanness of this College, with MAP2, GFAP and Dapi under microscope, part of our young College in this ancient University. Wolfson Fellow Giles Yeo’s research into the brain control of food intake. Image created by Dr Brian Lam and Mr Joseph Polex-Wolf from the Yeo laboratory. The paper used for the Review contains material sourced from responsibly managed forests, certified in accordance with the Forestry Stewardship Council, and is printed using vegetable based inks.
    [Show full text]