IG-VAE: Generative Modeling of Immunoglobulin Proteins by Direct

Total Page:16

File Type:pdf, Size:1020Kb

IG-VAE: Generative Modeling of Immunoglobulin Proteins by Direct bioRxiv preprint doi: https://doi.org/10.1101/2020.08.07.242347; this version posted August 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. IG-VAE: GENERATIVE MODELING OF IMMUNOGLOBULIN PROTEINS BY DIRECT 3D COORDINATE GENERATION Raphael R. Eguchi Namrata Anand Christian A. Choe Department of Biochemistry Department of Bioengineering Department of Bioengineering Stanford University Stanford University Stanford University [email protected] [email protected] [email protected] Po-Ssu Huang Department of Bioengineering Stanford University [email protected] ABSTRACT While deep learning models have seen increasing applications in protein science, few have been implemented for protein backbone generation—an important task in structure-based problems such as active site and interface design. We present a new approach to building class-specific backbones, using a variational auto-encoder to directly generate the 3D coordinates of immunoglobulins. Our model is torsion- and distance-aware, learns a high-resolution embedding of the dataset, and generates novel, high-quality structures compatible with existing design tools. We show that the Ig-VAE can be used to create a computational model of a SARS-CoV2-RBD binder via latent space sampling. We further demonstrate that the model’s generative prior is a powerful tool for guiding computational protein design, motivating a new paradigm under which backbone design is solved as constrained optimization problem in the latent space of a generative model. 1 Introduction Over the past two decades, the field of protein design has made steady progress, with new computational methods providing novel solutions to challenging problems such as enzyme catalysis, viral inhibition, de novo structure generation and more[1, 2, 3, 4, 5, 6, 7, 8, 9]. Building on the improvements in force field accuracy and conceptual advancements in understanding structural engineering principles, computational protein design is seemingly ready to take on any engineering challenge. However, designs created computationally rarely match the performance of variants derived from directed evolution experiments; there is still much room for improvement. While design force fields are known to be imperfect, a major limitation of computational design also stems from the difficulty in modeling backbone flexibility, specifically, movements in loops and the relative geometries of the structural elements in the protein. When comparing structural changes between pre- and post-evolution structures, we often find that the polypeptide chain responds to sequence changes in ways not predictable by the design algorithm. This is because current methods, such as Rosetta[10], tend to restrict backbone conformations to local energy minima. While this issue is usually addressed by sampling backbone torsional angles from known fragments, such a stochastic optimization process is discrete, sparse, time-consuming, and rarely uncovers the true ground state. To address this fundamental challenge in modeling protein structure, we explore the use of deep neural networks to infer a continuous structural space in which we can smoothly interpolate between different backbone conformations. In doing so we seek to capture elements of backbone motility that are otherwise difficult to reflect in rigid body design. We focus on the task of antibody design, as it harbors many challenges common across different protein design problems and is of significant practical importance: applications of antibodies have been found in every sector of biotechnology, from diagnostic procedures to immunotherapy. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.07.242347; this version posted August 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. With recent advances in deep learning technology, machine learning tools have seen increasing applications in protein science, with deep neural networks being applied to tasks such as sequence design[11], fold recognition[12], binding site prediction[13], and structure prediction[14, 15]. Generative models, which approximate the distributions of the data they are trained on, have garnered interest as a data-driven way to create novel proteins. Unfortunately, the majority of protein-generators create 1D amino acid sequences[16, 17] making them unsuitable for problems that require structure-based solutions such as designing protein-protein interfaces. Very few machine learning models have been implemented for protein backbone generation[18, 19], and none of the reported methods produce structures comparable in quality to those of experimentally validated tools such as RosettaRemodel[20]. Previously, our group was the first to report a Generative Adversarial Network (GAN) that generated 64-residue backbones that could serve as templates for de novo design[19]. This prior work focused on unconditioned structure generation via a distance matrix representation, and 3D coordinates were recovered using a convex optimization algorithm[19] or a learned coordinate recovery module[21]. Despite its novelty, the GAN method is accompanied by certain difficulties. The generated distance constraints are not guaranteed to be Euclidean-valid, and thus it is not possible to recover 3D coordinates that perfectly satisfy the generated constraints. We found that the quality of the recovered backbone torsions could often be degraded due to inconsistencies or errors in the generated distance constraints, leading to loss of important biochemical features, such as hydrogen-bonding. Moreover, since the GAN is trained on peptide fragments, we are not guaranteed to sample constraints for globular structures that can eventually fold. In this study, we use a variational autoencoder (VAE)[22] to perform direct 3D coordinate generation of full-atom protein backbones, circumventing the need to recover coordinates from pairwise distance constraints, and also avoiding the problem of distance matrix validity[23]. Using a coordinate representation has allowed us to make our model torsion-aware, significantly improving the quality of the generated backbones. Importantly, our model motivates a conceptually new way of solving protein design problems. Because the Ig-VAE generates coordinates directly, all of its outputs are fully differentiable through the coordinate representation. This allows us to use the generative prior to constrain structure generation with any differentiable coordinate-based heuristic, such as Rosetta energy, backbone shape constraints, packing metrics, and more. By optimizing a structure in the VAE latent space, designers are able to specify a set of desired structural features while the model creates the rest of the molecule. As an example of this approach, we use the Ig-VAE to perform constrained loop generation, towards epitope-specific antibody design. Our technology ultimately paves the way for a novel approach to protein design in which backbone construction is solved via constrained optimization in the latent space of a generative model. In contrast to conventional methods[20, 24], we term this approach “generative design.” Figure 1: VAE Training Scheme. The flow of data is shown with black arrows, and losses are shown in blue. First, Ramachandran angles and distance matrices are computed from the full-atom backbone coordinates of a training example. The distance matrix is passed to the encoder network (E), which generates a latent embedding that is passed to the decoder network (D). The decoder directly generates coordinates in 3D space, from which the reconstructed Ramachandran angles and distance matrix are computed. Errors from both the angles and distance matrix are back-propagated through the 3D coordinate representation to the encoder and decoder. Note that both the torsion and distance matrix losses are rotationally and translationally invariant, and that the coordinates of the training example are never seen by the model. The shown data are real inputs and outputs of the VAE for the immunoglobulin chain in PDB:4YXH(L). 2 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.07.242347; this version posted August 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 2 Methods 2.1 Dataset All training data were collected from the antibody structure database AbDb[25]. Domains that were missing residues were excluded, and sequence-redundant structures were included to allow the network to learn small backbone fluctuations. The final training set is comprised of 10768 immunoglobulins spanning 4154 non-sequence-redundant structures, including single-domain antibodies. The training set covers close to 100% of the AbDb database. Structures in the dataset vary in length from 89 to 138 residues, with most falling between 114 and 130. Since the input of our model was fixed at 128 residues (512 atoms), structures larger than 128 were center-cropped. Structures smaller than 128 were “structurally padded” by using RosettaRemodel to append dummy residues to the N and C termini. The reconstruction loss of the padded regions was down-weighted over the course of training (see Supplemental Methods). We found that the structural padding step led to slight improvements in local bond geometries at the terminal
Recommended publications
  • Advances in Rosetta Protein Structure Prediction on Massively Parallel Systems
    UC San Diego UC San Diego Previously Published Works Title Advances in Rosetta protein structure prediction on massively parallel systems Permalink https://escholarship.org/uc/item/87g6q6bw Journal IBM Journal of Research and Development, 52(1) ISSN 0018-8646 Authors Raman, S. Baker, D. Qian, B. et al. Publication Date 2008 Peer reviewed eScholarship.org Powered by the California Digital Library University of California Advances in Rosetta protein S. Raman B. Qian structure prediction on D. Baker massively parallel systems R. C. Walker One of the key challenges in computational biology is prediction of three-dimensional protein structures from amino-acid sequences. For most proteins, the ‘‘native state’’ lies at the bottom of a free- energy landscape. Protein structure prediction involves varying the degrees of freedom of the protein in a constrained manner until it approaches its native state. In the Rosetta protein structure prediction protocols, a large number of independent folding trajectories are simulated, and several lowest-energy results are likely to be close to the native state. The availability of hundred-teraflop, and shortly, petaflop, computing resources is revolutionizing the approaches available for protein structure prediction. Here, we discuss issues involved in utilizing such machines efficiently with the Rosetta code, including an overview of recent results of the Critical Assessment of Techniques for Protein Structure Prediction 7 (CASP7) in which the computationally demanding structure-refinement process was run on 16 racks of the IBM Blue Gene/Le system at the IBM T. J. Watson Research Center. We highlight recent advances in high-performance computing and discuss future development paths that make use of the next-generation petascale (.1012 floating-point operations per second) machines.
    [Show full text]
  • Precise Assembly of Complex Beta Sheet Topologies from De Novo 2 Designed Building Blocks
    1 Precise Assembly of Complex Beta Sheet Topologies from de novo 2 Designed Building Blocks. 3 Indigo Chris King*†, James Gleixner†, Lindsey Doyle§, Alexandre Kuzin‡, John F. Hunt‡, Rong Xiaox, 4 Gaetano T. Montelionex, Barry L. Stoddard§, Frank Dimaio†, David Baker† 5 † Institute for Protein Design, University of Washington, Seattle, Washington, United States 6 ‡ Biological Sciences, Northeast Structural Genomics Consortium, Columbia University, New York, New York, United 7 States 8 x Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, Northeast Struc- 9 tural Genomics Consortium, Rutgers, The State University of New Jersey, Piscataway, New Jersey, United States 10 § Basic Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States 11 12 ABSTRACT 13 Design of complex alpha-beta protein topologies poses a challenge because of the large number of alternative packing 14 arrangements. A similar challenge presumably limited the emergence of large and complex protein topologies in evo- 15 lution. Here we demonstrate that protein topologies with six and seven-stranded beta sheets can be designed by inser- 16 tion of one de novo designed beta sheet containing protein into another such that the two beta sheets are merged to 17 form a single extended sheet, followed by amino acid sequence optimization at the newly formed strand-strand, strand- 18 helix, and helix-helix interfaces. Crystal structures of two such designs closely match the computational design models. 19 Searches for similar structures in the SCOP protein domain database yield only weak matches with different beta sheet 20 connectivities. A similar beta sheet fusion mechanism may have contributed to the emergence of complex beta sheets 21 during natural protein evolution.
    [Show full text]
  • I S C B N E W S L E T T
    ISCB NEWSLETTER FOCUS ISSUE {contents} President’s Letter 2 Member Involvement Encouraged Register for ISMB 2002 3 Registration and Tutorial Update Host ISMB 2004 or 2005 3 David Baker 4 2002 Overton Prize Recipient Overton Endowment 4 ISMB 2002 Committees 4 ISMB 2002 Opportunities 5 Sponsor and Exhibitor Benefits Best Paper Award by SGI 5 ISMB 2002 SIGs 6 New Program for 2002 ISMB Goes Down Under 7 Planning Underway for 2003 Hot Jobs! Top Companies! 8 ISMB 2002 Job Fair ISCB Board Nominations 8 Bioinformatics Pioneers 9 ISMB 2002 Keynote Speakers Invited Editorial 10 Anna Tramontano: Bioinformatics in Europe Software Recommendations11 ISCB Software Statement volume 5. issue 2. summer 2002 Community Development 12 ISCB’s Regional Affiliates Program ISCB Staff Introduction 12 Fellowship Recipients 13 Awardees at RECOMB 2002 Events and Opportunities 14 Bioinformatics events world wide INTERNATIONAL SOCIETY FOR COMPUTATIONAL BIOLOGY A NOTE FROM ISCB PRESIDENT This newsletter is packed with information on development and dissemination of bioinfor- the ISMB2002 conference. With over 200 matics. Issues arise from recommendations paper submissions and over 500 poster submis- made by the Society’s committees, Board of sions, the conference promises to be a scientific Directors, and membership at large. Important feast. On behalf of the ISCB’s Directors, staff, issues are defined as motions and are discussed EXECUTIVE COMMITTEE and membership, I would like to thank the by the Board of Directors on a bi-monthly Philip E. Bourne, Ph.D., President organizing committee, local organizing com- teleconference. Motions that pass are enacted Michael Gribskov, Ph.D., mittee, and program committee for their hard by the Executive Committee which also serves Vice President work preparing for the conference.
    [Show full text]
  • Bioengineering Professor Trey Ideker Wins 2009 Overton Prize
    Bioengineering Professor Trey Ideker Wins 2009 Overton Prize March 13, 2009 Daniel Kane University of California, San Diego bioengineering professor Trey Ideker-a network and systems biology pioneer-has won the International Society for Computational Biology's Overton Prize. The Overton prize is awarded each year to an early-to-mid-career scientist who has already made a significant contribution to the field of computational biology. Trey Ideker is an Associate Professor of Bioengineering at UC San Diego's Jacobs School of Engineering, Adjunct Professor of Computer Science, and member of the Moores UCSD Cancer Center. He is a pioneer in using genome-scale measurements to construct network models of cellular processes and disease. His recent research activities include development of software and algorithms for protein network analysis, network-level comparison of pathogens, and genome-scale models of the response to DNA-damaging agents. "Receiving this award is a wonderful honor and helps to confirm that the work we have been doing for the past several years has been useful to people," said Ideker. "This award also provides great recognition to UC San Diego which has fantastic bioinformatics programs both at the undergraduate and graduate level. I could never have done it without the help of some really first-rate bioinformatics and bioengineering graduate students," said Ideker. Ideker is on the faculty of the Jacobs School of Engineering's Department of Bioengineering, which ranks 2nd in the nationfor biomedical engineering, according to the latest US News rankings. The bioengineering department has ranked among the top five programs in the nation every year for the past decade.
    [Show full text]
  • A Glance Into the Evolution of Template-Free Protein Structure Prediction Methodologies Arxiv:2002.06616V2 [Q-Bio.QM] 24 Apr 2
    A glance into the evolution of template-free protein structure prediction methodologies Surbhi Dhingra1, Ramanathan Sowdhamini2, Frédéric Cadet3,4,5, and Bernard Offmann∗1 1Université de Nantes, CNRS, UFIP, UMR6286, F-44000 Nantes, France 2Computational Approaches to Protein Science (CAPS), National Centre for Biological Sciences (NCBS), Tata Institute for Fundamental Research (TIFR), Bangalore 560-065, India 3University of Paris, BIGR—Biologie Intégrée du Globule Rouge, Inserm, UMR_S1134, Paris F-75015, France 4Laboratory of Excellence GR-Ex, Boulevard du Montparnasse, Paris F-75015, France 5DSIMB, UMR_S1134, BIGR, Inserm, Faculty of Sciences and Technology, University of La Reunion, Saint-Denis F-97715, France Abstract Prediction of protein structures using computational approaches has been explored for over two decades, paving a way for more focused research and development of algorithms in com- parative modelling, ab intio modelling and structure refinement protocols. A tremendous suc- cess has been witnessed in template-based modelling protocols, whereas strategies that involve template-free modelling still lag behind, specifically for larger proteins (> 150 a.a.). Various improvements have been observed in ab initio protein structure prediction methodologies over- time, with recent ones attributed to the usage of deep learning approaches to construct protein backbone structure from its amino acid sequence. This review highlights the major strategies undertaken for template-free modelling of protein structures while discussing few tools devel- arXiv:2002.06616v2 [q-bio.QM] 24 Apr 2020 oped under each strategy. It will also briefly comment on the progress observed in the field of ab initio modelling of proteins over the course of time as seen through the evolution of CASP platform.
    [Show full text]
  • Deep Learning-Based Advances in Protein Structure Prediction
    International Journal of Molecular Sciences Review Deep Learning-Based Advances in Protein Structure Prediction Subash C. Pakhrin 1,†, Bikash Shrestha 2,†, Badri Adhikari 2,* and Dukka B. KC 1,* 1 Department of Electrical Engineering and Computer Science, Wichita State University, Wichita, KS 67260, USA; [email protected] 2 Department of Computer Science, University of Missouri-St. Louis, St. Louis, MO 63121, USA; [email protected] * Correspondence: [email protected] (B.A.); [email protected] (D.B.K.) † Both the authors should be considered equal first authors. Abstract: Obtaining an accurate description of protein structure is a fundamental step toward under- standing the underpinning of biology. Although recent advances in experimental approaches have greatly enhanced our capabilities to experimentally determine protein structures, the gap between the number of protein sequences and known protein structures is ever increasing. Computational protein structure prediction is one of the ways to fill this gap. Recently, the protein structure prediction field has witnessed a lot of advances due to Deep Learning (DL)-based approaches as evidenced by the suc- cess of AlphaFold2 in the most recent Critical Assessment of protein Structure Prediction (CASP14). In this article, we highlight important milestones and progresses in the field of protein structure prediction due to DL-based methods as observed in CASP experiments. We describe advances in various steps of protein structure prediction pipeline viz. protein contact map prediction, protein distogram prediction, protein real-valued distance prediction, and Quality Assessment/refinement. We also highlight some end-to-end DL-based approaches for protein structure prediction approaches. Additionally, as there have been some recent DL-based advances in protein structure determination Citation: Pakhrin, S.C.; Shrestha, B.; using Cryo-Electron (Cryo-EM) microscopy based, we also highlight some of the important progress Adhikari, B.; KC, D.B.
    [Show full text]
  • THE FUTURE POSTPONED 2.0 Why Declining Investment in Basic Research Threatens a U.S
    THE FUTURE POSTPONED 2.0 Why Declining Investment in Basic Research Threatens a U.S. Innovation Deficit MIT Washington Office 820 First St. NE, Suite 610 Washington, DC 20002-8031 Tel: 202-789-1828 futurepostponed.org THE FUTURE POSTPONED 2.0 Why Declining Investment in Basic Research Threatens a U.S. Innovation Deficit The Future Postponed 2.0. October 2016. Cambridge, Massachusetts. This material may be freely reproduced. Cover image: Leigh Prather/Shutterstock Contents UNLEASHING THE POWER OF SYNTHETIC PROTEINS 4 Potential Breakthroughs in Medicine, Energy, and Technology DAVID BAKER, Professor of Biochemistry, University of Washington Investigator, Howard Hughes Medical Institute PREDICTING THE FUTURE OF EARTH’S FORESTS 9 Forests absorb carbon dioxide and are thus an important buffer against climate change, for now. Understanding forest dynamics would enable both better management of forests and the ability to assess how they are changing. STUART J. DAVIES, Frank H. Levinson Chair in Global Forest Science; Director, Forest Global Earth Observatory-Center for Tropical Forest Science, Smithsonian Tropical Research Institute RESETTING THE CLOCK OF LIFE 13 We know that the circadian clock keeps time in every living cell, controlling biological processes such as metabolism, cell division, and DNA repair, but we don’t understand how. Gaining such knowledge would not only offer fundamental insights into cellular biochemistry, but could also yield practical results in areas from agriculture to medicine to human aging. NING ZHENG, Howard Hughes Medical Institute, Department of Pharmacology University of Washington; MICHELE PAGANO, M.D., Howard Hughes Medical Institute, Department of Pathology and NYU Cancer Institute, New York University School of Medicine CREATING A CENSUS OF HUMAN CELLS 16 For the first time, new techniques make possible a systematic description of the myriad types of cells in the human body that underlie both health and disease.
    [Show full text]
  • Alzheimer's Disease Ask Document
    UW MEDICINE INSTITUTE FOR PROTEIN DESIGN TACKLING THE FOUNDATION OF ALZHEIMER’S DISEASE The University of Washington Institute for Protein Design LZHEIMER’S DISEASE IS A DEVASTATING CONDITION. This relentlessly Aprogressive neurodegenerative disorder has no cure, it can’t be prevented, and the mean life expectancy of a person who has been diagnosed is a short seven years. The disease imposes heavy social, psychological, physical and economic burdens on caregivers, and it is costly for society: the U.S. now spends an estimated $100 billion on it every year. And costs are steadily rising with the aging of the population. Unfortunately, despite investing billions of dollars and many years into more than 400 potential drug treatments, there are, as yet, no solutions for Alzheimer’s disease. Given that the number of people affected worldwide could quadruple to more than 100 million by 2050, we think it is time for a totally new approach: protein design. Scientists at the University of Washington’s Institute for Protein Design have already taken the first steps to change the paradigm for Alzheimer’s disease research and treatment. We invite you to join us. Normal nerve cells are Mis-folding Proteins and Alzheimer’s Disease shown on the top. Accumulation of amyloid Proteins are the body’s workhorses: the major functional components of living cells. plaques and formation However, over a person’s lifetime, otherwise normal proteins may malfunction — and may of neurofibrillary tangles even cause toxic activity that harms the cell. Many diseases, including Alzheimer’s disease, (shown on bottom) are hallmarks of result from a protein malfunction known as mis-folding.
    [Show full text]
  • Introducing Foldit Education Mode
    correspondence Introducing Foldit Education Mode To the Editor — In its twelve years online, scored using the Rosetta force field2, with that walk the student through a standard Foldit has become a frequently used the players collaborating and competing to set of topics in biochemistry education teaching tool for biochemistry at the attain the best-scoring protein structure. (Fig. 1a), including atomic interactions, the college level, despite not providing premade The concept is that, if hundreds or hydrophobic effect, amino acids, primary content specifically for education. Here thousands of players all work to create a structure, secondary structure, tertiary we introduce Foldit Education Mode, protein structure, there is a higher chance structure, protein folding pathways and in which students proceed through a that one of them may find the correct fold. ligand binding. Many of the puzzles start by self-guided tutorial of standard concepts The interactive nature of Foldit and its allowing the student to explore the puzzle in protein biochemistry via a series of accessibility to non-scientist audiences as a sandbox, thus becoming familiar interactive three-dimensional biochemistry quickly attracted the interest of educators3–6, with it before starting the tutorial. In each puzzles. Education Mode could be used despite Foldit not providing material that is puzzle, students click through a tutorial independently or as a central component explicitly tailored for science education. We on the topic that guides them through the of a science class. With many classrooms have recently introduced Custom Contests7, conceptual basis of the puzzle (Fig. 1b). staying in remote mode this autumn, Foldit a feature that allows educators to create, Then, students solve the puzzle by achieving Education Mode offers a unique and free administer and grade their own Foldit a prespecified Foldit score.
    [Show full text]
  • Arxiv:2102.13419V2 [Cs.LG] 16 Mar 2021 and Translated Then the Predicted final Structure Is Rotated and Translated in the Same Way
    Iterative SE(3)-Transformers Fabian B. Fuchs y1, Edward Wagstaff y1, Justas Dauparas2, and Ingmar Posner1 1 Department of Engineering Science, University of Oxford, Oxford, UK 2 Institute for Protein Design, University of Washington, WA, USA y These authors contributed equally ffabian,[email protected] Abstract. When manipulating three-dimensional data, it is possible to ensure that rotational and translational symmetries are respected by ap- plying so-called SE(3)-equivariant models. Protein structure prediction is a prominent example of a task which displays these symmetries. Re- cent work in this area has successfully made use of an SE(3)-equivariant model, applying an iterative SE(3)-equivariant attention mechanism. Mo- tivated by this application, we implement an iterative version of the SE(3)-Transformer, an SE(3)-equivariant attention-based model for graph data. We address the additional complications which arise when apply- ing the SE(3)-Transformer in an iterative fashion, compare the iterative and single-pass versions on a toy problem, and consider why an iterative model may be beneficial in some problem settings. We make the code for our implementation available to the community3. Keywords: Deep Learning · Equivariance · Graphs · Proteins 1 Introduction Tasks involving manipulation of three-dimensional (3D) data often exhibit rota- tional and translational symmetry, such that the overall orientation or position of the data is not relevant to solving the task. One prominent example of such a task is protein structure refinement [1]. The goal is to improve on the initial 3D structure { the position and orientation of the structure, i.e.
    [Show full text]
  • Trends in Computational Biology—2010
    FEATURE Trends in computational biology—2010 H Craig Mak Interviews with leading scientists highlight several notable breakthroughs in computational biology from the past year and suggest areas where computation may drive biological discovery. he field of computational biology encom- been featured in our pages and elsewhere; oth- Seq), but mapping and assembling the resulting Tpasses a set of investigative tools as much ers were completely off our radar. Although reads are challenging owing to the complexities as being a research endeavor in its own right. we surveyed a small group of 15 scientists, the introduced by RNA splicing. The two methods It is often difficult to gauge the utility and nominated papers (Box 1) provide a snapshot are the first that robustly assemble full-length significance of a computational tool, at least of some of the most exciting areas of current transcripts, including alternative splicing iso- until the research community has had suffi- computational biology research. forms. In contrast to previous approaches, these cient time to explore, exploit and hone it in All the papers featured in the following two methods first map reads to the genome various applications. In an effort to identify pages were nominated by at least two scien- using software that takes possible splice junc- recent notable breakthroughs in the field of tists. Our analysis not only highlights the rich- tions into account, thereby making assembly computational biology, Nature Biotechnology ness of approaches and growth of the field, but more manageable. Then, they apply graph- surveyed leading researchers in the area, also suggests that researchers of a particular based algorithms to determine1,2 and quantify1 asking them to nominate papers of particu- type are driving much of cutting-edge com- the most likely splice isoforms.
    [Show full text]
  • In Systems Physiology
    IN SYSTEMS PHYSIOLOGY Interpretation of the consequences of mutations in protein kinases: combined use of bioinformatics and text mining Jose M.G. Izarzugaza, Martin Krallinger and Alfonso Valencia Journal Name: Frontiers in Physiology ISSN: 1664-042X Article type: Review Article Received on: 23 May 2012 Accepted on: 23 Jul 2012 Provisional PDF published on: 23 Jul 2012 Frontiers website link: www.frontiersin.org Citation: Izarzugaza JM, Krallinger M and Valencia A(2012) Interpretation of the consequences of mutations in protein kinases: combined use of bioinformatics and text mining. Front. Physio. 3:323. doi:10.3389/fphys.2012.00323 Article URL: http://www.frontiersin.org/Journal/Abstract.aspx?s=1092& name=systems%20physiology&ART_DOI=10.3389/fphys.2012.00323 (If clicking on the link doesn't work, try copying and pasting it into your browser.) Copyright statement: © 2012 Izarzugaza, Krallinger and Valencia. This is an open‐access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc. This Provisional PDF corresponds to the article as it appeared upon acceptance, after rigorous peer-review. Fully formatted PDF and full text (HTML) versions will be made available soon. Interpretation of the consequences of mutations in protein kinases: combined use of bioinformatics and text mining Jose MG Izarzugaza, Martin Krallinger and Alfonso Valencia Abstract Protein kinases play a crucial role in a plethora of significant physiological functions and a number of mutations in this superfamily have been reported in the literature to disrupt protein structure and/or function.
    [Show full text]