Trends in Computational Biology—2010

Total Page:16

File Type:pdf, Size:1020Kb

Trends in Computational Biology—2010 FEATURE Trends in computational biology—2010 H Craig Mak Interviews with leading scientists highlight several notable breakthroughs in computational biology from the past year and suggest areas where computation may drive biological discovery. he field of computational biology encom- been featured in our pages and elsewhere; oth- Seq), but mapping and assembling the resulting Tpasses a set of investigative tools as much ers were completely off our radar. Although reads are challenging owing to the complexities as being a research endeavor in its own right. we surveyed a small group of 15 scientists, the introduced by RNA splicing. The two methods It is often difficult to gauge the utility and nominated papers (Box 1) provide a snapshot are the first that robustly assemble full-length significance of a computational tool, at least of some of the most exciting areas of current transcripts, including alternative splicing iso- until the research community has had suffi- computational biology research. forms. In contrast to previous approaches, these cient time to explore, exploit and hone it in All the papers featured in the following two methods first map reads to the genome various applications. In an effort to identify pages were nominated by at least two scien- using software that takes possible splice junc- recent notable breakthroughs in the field of tists. Our analysis not only highlights the rich- tions into account, thereby making assembly computational biology, Nature Biotechnology ness of approaches and growth of the field, but more manageable. Then, they apply graph- surveyed leading researchers in the area, also suggests that researchers of a particular based algorithms to determine1,2 and quantify1 asking them to nominate papers of particu- type are driving much of cutting-edge com- the most likely splice isoforms. The algorithms lar interest published in the previous year putational biology (Box 2). Read on to find were applied to mammalian transcriptomes to that have influenced the direction of their out what characterizes them and what they’ve follow global patterns of splicing during a devel- research. Some of the nominated papers had been doing in the past year. opmental time course1 and to identify novel, spliced, long, noncoding RNAs that had not been annotated by existing methods2. Next-generation longer molecule. Researchers face challenges on Progress toward the second challenge sequence analysis two levels when turning massive collections of of genome interpretation was reported in Imagine an experi- reads into biologically meaningful information. papers that demonstrate the potential of ment generating a bil- The first set of challenges lies in processing the genome sequencing © 2011 Nature America, Inc. All rights reserved. All rights Inc. America, Nature © 2011 lion data points every reads themselves: mapping them to their genomic for genetic analysis day, the equivalent of locations, and then assembling them into longer of human traits. The running millions of contiguous stretches of DNA. The second set of approach, pioneered agarose gels—and tap- challenges lies in interpreting large collections by Jay Shendure at ing (remember that!) of reads, which may be assembled into whole the University of the pictures into tens genomes, to understand the functional effects of Washington in Seattle, of thousands of labo- genetic variation. Thousands of genomes from sequences the exonic ratory notebooks, or hybridizing thousands of humans, plants, animals and disease tissues have regions of several gene-expression microarrays. Computational already been sequenced—and all are in need of genomes to identify biology has risen in prominence in recent better interpretation. Although algorithms, protein-disrupting years largely because of the increase in the such as BLAST for searching and CLUSTALW mutations linked to disease. “This study dra- data-generation capacity of high-throughput for aligning, continue to be the workhorses of matically demonstrates how we can make new technology. More data create more opportuni- sequence analysis, several next-generation com- genetic discoveries by sequencing all the exons ties and a more pressing need for systematic putational methods have emerged to cope with in a set of patients,” says Steven Salzberg of the methods of analysis. And nowhere has that the DNA sequences captured in billions of short University of Maryland and a co-author with need been more evident than in the field of reads and thousands of genomes. Pachter1. Since the publication of the first success next-generation sequencing. of this strategy in January 2010 (ref. 3), several The latest sequencers take a week or two to The advance. Two methods for de novo additional studies have taken a similar approach generate about a billion short reads, stretches transcriptome assembly of short reads were to study the genetics of human diseases. of about 50–400 bp of DNA sequenced from a published this year from Lior Pachter and col- leagues1 and from Aviv Regev and colleagues2. What it means. Advances in transcript assem- H. Craig Mak is Associate Editor, The transcriptome can be analyzed by sequenc- bly from RNA-Seq data should allow alterna- Nature Biotechnology ing cDNA reverse transcribed from RNA (RNA- tive splicing to be studied genome-wide across NATURE BIOTECHNOLOGY VOLUME 29 NUMBER 1 JANUARY 2011 45 FEatUrE many biological con- tific advances. As a result of incentives built tested to see whether they carried a total of ditions, such as in into the Health Information Technology for five SNPs previously associated with seven different tissues, over Economic and Clinical Health (HITECH) diseases (coronary artery disease, carotid different time points Act passed as part of the Obama administra- artery stenosis, atrial fibrillation, multiple and in response to tion’s health-care reform, in 10 years almost sclerosis, lupus, rheumatoid arthritis and genetic and chemi- every US hospital could be using electronic Crohn’s disease). To identify patient pheno- cal perturbations. In medical records, up from 1.5–2% of hospitals types in an automated fashion, they used bill- addition, RNA-Seq today. “What’s fun to think about,” says Atul ing codes in the electronic medical records is now poised as a Butte of Stanford University, “is what kind of to group patients into ‘case’ and ‘control’ tool for discover- science can we derive from this?” populations for 776 phenotypes. Finally, Steven Salzberg ing new RNAs that Electronic medical records can contain a Chi-squared statistical test was used to collaborated with Lior we may not have a wealth of history on physical exams and evaluate whether patients harboring a spe- Pachter and Barbara Wold to develop a even known were treatment regimes. Particularly amenable cific SNP also tended to display a particular method for assembling transcribed from a to automated analysis in the records are the phenotype. The authors noted that, although short reads sequenced genome. Armed with standardized administrative billing codes used there are many statistical challenges with from cDNA into better knowledge to charge for each procedure, test or clinical this kind of analysis and there is much room full-length spliced of splicing patterns visit. These codes can for improvement of their method, four of transcripts. and comprehensive track anything from the seven previously known disease-gene transcript catalogs, diagnosis of coronary associations could be replicated, and several it should be possible to improve the annota- artery disease to the potential associations with other diseases tion of genomes. “Thousands of people have procedure for insert- were identified but not rigorously validated. accessed our software,” says Salzberg, “and it ing a stent to keep These results highlight the possibility that is being integrated into easy-to-use graphical blood vessels open. novel biological discoveries might be made interfaces such as Galaxy”4. Several hospitals, using this approach. The flood of whole genomes and exomes with Vanderbilt Uni­ should also drive sequence-analysis methods. versity (Nashville, What it means. “Everyone wishes they could Attending an International Cancer Genome TN, USA) among Atul Butte: “Ninety- do this kind of study,” remarks Butte, “but it nine percent of the Consortium meeting in Brisbane, Australia, in the leaders, are pair- work is not in software represents a multimillion dollar investment. December, Debbie Marks of Harvard University ing electronic medi- engineering or coding, Vanderbilt is leading the way.” Several other noted, “We’re still in the early days of whole- cal records with the it’s in coming up hospitals are making similar investments. genome analysis…problems need to be articu- collection of tissue with the right kind of The Mayo Clinic, for example, is coupling lated in ways that are computationally tractable.” samples from every question:…[one that] specimens collected from 20,000 patients Many of the problems will require algorithmic patient treated. These no one even realizes with electronic medical records and other we can ask today.” advances, such as better de novo assembly of resources represent data gathered and standardized across the transcriptomes and genomes. However, much an unprecedented hospital system. “There has always been a of the work that goes into interpreting a patient’s source of data on the genetic and physiologi- question about whether electronic medi- whole genome to identify disease-causing cal state of people linked to standardized, com- cal records would be of sufficient quality to mutations, for instance, involves filtering vari- putable records of their phenome, or the set of allow genetic discovery,” says Russ Altman, © 2011 Nature America, Inc. All rights reserved. All rights Inc. America, Nature © 2011 ants identified by sequencing against databases all phenotypes including disease diagnosis and also at Stanford. “This paper sets the stage of known variants. As better databases should responses to treatment. for widespread use of electronic records for lead to improved genome analysis, it might be genomic discovery.” reasonable to expect the development and min- The advance.
Recommended publications
  • Program Book
    Pacific Symposium on Biocomputing 2016 January 4-8, 2016 Big Island of Hawaii Program Book PACIFIC SYMPOSIUM ON BIOCOMPUTING 2016 Big Island of Hawaii, January 4-8, 2016 Welcome to PSB 2016! We have prepared this program book to give you quick access to information you need for PSB 2016. Enclosed you will find • Logistics information • Menus for PSB hosted meals • Full conference schedule • Call for Session and Workshop Proposals for PSB 2017 • Poster/abstract titles and authors • Participant List Conference materials are also available online at http://psb.stanford.edu/conference-materials/. PSB 2016 gratefully acknowledges the support the Institute for Computational Biology, a collaborative effort of Case Western Reserve University, the Cleveland Clinic Foundation, and University Hospitals; the National Institutes of Health (NIH), the National Science Foundation (NSF); and the International Society for Computational Biology (ISCB). If you or your institution are interested in sponsoring, PSB, please contact Tiffany Murray at [email protected] If you have any questions, the PSB registration staff (Tiffany Murray, Georgia Hansen, Brant Hansen, Kasey Miller, and BJ Morrison-McKay) are happy to help you. Aloha! Russ Altman Keith Dunker Larry Hunter Teri Klein Maryln Ritchie The PSB 2016 Organizers PACIFIC SYMPOSIUM ON BIOCOMPUTING 2016 Big Island of Hawaii, January 4-8, 2016 SPEAKER INFORMATION Oral presentations of accepted proceedings papers will take place in Salon 2 & 3. Speakers are allotted 10 minutes for presentation and 5 minutes for questions for a total of 15 minutes. Instructions for uploading talks were sent to authors with oral presentations. If you need assistance with this, please see Tiffany Murray or another PSB staff member.
    [Show full text]
  • Conference Proceedingssmall
    1 COMMITTEES Steering Committee Phil Bourne - University of California, San Diego Eric Davidson - California Institute of Technology Steven Salzberg - The Institute for Genomic Research John Wooley - University of California San Diego, San Diego Supercomputer Center Organizing Committee Pat Blauvelt – LSS Membership Director Karen Hauge – Palo Alto Medical Foundation, Local Arangements Kass Goldfein - Finance Consultant AlishaHolloway – The J. David Gladstone Institutes, Tutorial Chair Sami Khuri – San Jose State University, Poster Chair Ann Loraine – University of North Carolina at Charlotte, CSB Publication Chair Fenglou Mao – University of Georgia, On-Line Registration and Refereeing Website Peter Markstein – in silico Labs, Program Co-Chair Vicky Markstein - Life Sciences Society, Conference Chair, LSS President Jean Tsukamoto - Graphics Design Bill Wang - Sun Microsystems Inc, LSS Information Technology Director Ying Xu – University of Georgia, Program Co-Chair Program Committee Tatsuya Akutsu – Kyoto University Chris Bailey-Kellogg – Dartmouth College Pierre Baldi – University of California Irvine Liming Cai – University of Georgia Bill Cannon – Pacific Northwest National Laboratory Jake Chen – Indiana University Bhaskar DasGupta – University of Illinois Chicago Andrey Gorin – Oak Ridge National Laboratory Matt Hibbs – Princeton University Wen-Lian Hsu – Academia Sinica Tamer Kahveci – University of Florida Carl Kingsford – University of Maryland Christina Leslie – Memorial Sloan-Kettering Cancer Center Jing Li – Case Western Reserve
    [Show full text]
  • Pacific Symposium on Biocomputing 2020 January 3-7, 2020 Big Island of Hawaii
    Pacific Symposium on Biocomputing 2020 January 3-7, 2020 Big Island of Hawaii Program Book PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 Big Island of Hawaii, January 3-7, 2020 Welcome to PSB 2020! We have prepared this program book to give you quick access to information you need for PSB 2020. Enclosed you will find: • Logistics information • Medical services information • Full conference schedule (see website for the latest version) • Agendas for workshops and JBrowse demo • Call for Session and Workshop Proposals for PSB 2021 • Poster/abstract titles and author index • Participant list The latest conference materials are available online at http://psb.stanford.edu/conference-materials/. PSB 2020 gratefully acknowledges the support of the Cleveland Institute for Computational Biology; UPenn Institute for Biomedical Informatics; Variant Bio; Penn Center for Precision Medicine, Penn Medicine; Cipherome; Translational Bioinformatics Conference (TBC); the National Institutes of Health (NIH); and the International Society for Computational Biology (ISCB). If you or your institution are interested in sponsoring, PSB, please contact Tiffany Murray at [email protected] If you have any questions, the PSB registration staff (Tiffany Murray, Kasey Miller, BJ Morrison-McKay, and Cindy Paulazzo) are happy to help you. Aloha! Russ Altman Keith Dunker Larry Hunter Teri Klein Maryln Ritchie The PSB 2020 Organizers PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 Big Island of Hawaii, January 3-7, 2020 SPEAKER INFORMATION Oral presentations of accepted proceedings papers will take place in Salon 2 & 3. Speakers are allotted 10 minutes for presentation and 5 minutes for questions for a total of 15 minutes. Instructions for uploading talks were sent to authors with oral presentations.
    [Show full text]
  • Daniel Aalberts Scott Aa
    PLOS Computational Biology would like to thank all those who reviewed on behalf of the journal in 2015: Daniel Aalberts Jeff Alstott Benjamin Audit Scott Aaronson Christian Althaus Charles Auffray Henry Abarbanel Benjamin Althouse Jean-Christophe Augustin James Abbas Russ Altman Robert Austin Craig Abbey Eduardo Altmann Bruno Averbeck Hermann Aberle Philipp Altrock Ferhat Ay Robert Abramovitch Vikram Alva Nihat Ay Josep Abril Francisco Alvarez-Leefmans Francisco Azuaje Luigi Acerbi Rommie Amaro Marc Baaden Orlando Acevedo Ettore Ambrosini M. Madan Babu Christoph Adami Bagrat Amirikian Mohan Babu Frederick Adler Uri Amit Marco Bacci Boris Adryan Alexander Anderson Stephen Baccus Tinri Aegerter-Wilmsen Noemi Andor Omar Bagasra Vera Afreixo Isabelle Andre Marc Baguelin Ashutosh Agarwal R. David Andrew Timothy Bailey Ira Agrawal Steven Andrews Wyeth Bair Jacobo Aguirre Ioan Andricioaei Chris Bakal Alaa Ahmed Ioannis Androulakis Joseph Bak-Coleman Hasan Ahmed Iris Antes Adam Baker Natalie Ahn Maciek Antoniewicz Douglas Bakkum Thomas Akam Haroon Anwar Gabor Balazsi Ilya Akberdin Stefano Anzellotti Nilesh Banavali Eyal Akiva Miguel Aon Rahul Banerjee Sahar Akram Lucy Aplin Edward Banigan Tomas Alarcon Kevin Aquino Martin Banks Larissa Albantakis Leonardo Arbiza Mukul Bansal Reka Albert Murat Arcak Shweta Bansal Martí Aldea Gil Ariel Wolfgang Banzhaf Bree Aldridge Nimalan Arinaminpathy Lei Bao Helen Alexander Jeffrey Arle Gyorgy Barabas Alexander Alexeev Alain Arneodo Omri Barak Leonidas Alexopoulos Markus Arnoldini Matteo Barberis Emil Alexov
    [Show full text]
  • Advances in Rosetta Protein Structure Prediction on Massively Parallel Systems
    UC San Diego UC San Diego Previously Published Works Title Advances in Rosetta protein structure prediction on massively parallel systems Permalink https://escholarship.org/uc/item/87g6q6bw Journal IBM Journal of Research and Development, 52(1) ISSN 0018-8646 Authors Raman, S. Baker, D. Qian, B. et al. Publication Date 2008 Peer reviewed eScholarship.org Powered by the California Digital Library University of California Advances in Rosetta protein S. Raman B. Qian structure prediction on D. Baker massively parallel systems R. C. Walker One of the key challenges in computational biology is prediction of three-dimensional protein structures from amino-acid sequences. For most proteins, the ‘‘native state’’ lies at the bottom of a free- energy landscape. Protein structure prediction involves varying the degrees of freedom of the protein in a constrained manner until it approaches its native state. In the Rosetta protein structure prediction protocols, a large number of independent folding trajectories are simulated, and several lowest-energy results are likely to be close to the native state. The availability of hundred-teraflop, and shortly, petaflop, computing resources is revolutionizing the approaches available for protein structure prediction. Here, we discuss issues involved in utilizing such machines efficiently with the Rosetta code, including an overview of recent results of the Critical Assessment of Techniques for Protein Structure Prediction 7 (CASP7) in which the computationally demanding structure-refinement process was run on 16 racks of the IBM Blue Gene/Le system at the IBM T. J. Watson Research Center. We highlight recent advances in high-performance computing and discuss future development paths that make use of the next-generation petascale (.1012 floating-point operations per second) machines.
    [Show full text]
  • Precise Assembly of Complex Beta Sheet Topologies from De Novo 2 Designed Building Blocks
    1 Precise Assembly of Complex Beta Sheet Topologies from de novo 2 Designed Building Blocks. 3 Indigo Chris King*†, James Gleixner†, Lindsey Doyle§, Alexandre Kuzin‡, John F. Hunt‡, Rong Xiaox, 4 Gaetano T. Montelionex, Barry L. Stoddard§, Frank Dimaio†, David Baker† 5 † Institute for Protein Design, University of Washington, Seattle, Washington, United States 6 ‡ Biological Sciences, Northeast Structural Genomics Consortium, Columbia University, New York, New York, United 7 States 8 x Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, Northeast Struc- 9 tural Genomics Consortium, Rutgers, The State University of New Jersey, Piscataway, New Jersey, United States 10 § Basic Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States 11 12 ABSTRACT 13 Design of complex alpha-beta protein topologies poses a challenge because of the large number of alternative packing 14 arrangements. A similar challenge presumably limited the emergence of large and complex protein topologies in evo- 15 lution. Here we demonstrate that protein topologies with six and seven-stranded beta sheets can be designed by inser- 16 tion of one de novo designed beta sheet containing protein into another such that the two beta sheets are merged to 17 form a single extended sheet, followed by amino acid sequence optimization at the newly formed strand-strand, strand- 18 helix, and helix-helix interfaces. Crystal structures of two such designs closely match the computational design models. 19 Searches for similar structures in the SCOP protein domain database yield only weak matches with different beta sheet 20 connectivities. A similar beta sheet fusion mechanism may have contributed to the emergence of complex beta sheets 21 during natural protein evolution.
    [Show full text]
  • I S C B N E W S L E T T
    ISCB NEWSLETTER FOCUS ISSUE {contents} President’s Letter 2 Member Involvement Encouraged Register for ISMB 2002 3 Registration and Tutorial Update Host ISMB 2004 or 2005 3 David Baker 4 2002 Overton Prize Recipient Overton Endowment 4 ISMB 2002 Committees 4 ISMB 2002 Opportunities 5 Sponsor and Exhibitor Benefits Best Paper Award by SGI 5 ISMB 2002 SIGs 6 New Program for 2002 ISMB Goes Down Under 7 Planning Underway for 2003 Hot Jobs! Top Companies! 8 ISMB 2002 Job Fair ISCB Board Nominations 8 Bioinformatics Pioneers 9 ISMB 2002 Keynote Speakers Invited Editorial 10 Anna Tramontano: Bioinformatics in Europe Software Recommendations11 ISCB Software Statement volume 5. issue 2. summer 2002 Community Development 12 ISCB’s Regional Affiliates Program ISCB Staff Introduction 12 Fellowship Recipients 13 Awardees at RECOMB 2002 Events and Opportunities 14 Bioinformatics events world wide INTERNATIONAL SOCIETY FOR COMPUTATIONAL BIOLOGY A NOTE FROM ISCB PRESIDENT This newsletter is packed with information on development and dissemination of bioinfor- the ISMB2002 conference. With over 200 matics. Issues arise from recommendations paper submissions and over 500 poster submis- made by the Society’s committees, Board of sions, the conference promises to be a scientific Directors, and membership at large. Important feast. On behalf of the ISCB’s Directors, staff, issues are defined as motions and are discussed EXECUTIVE COMMITTEE and membership, I would like to thank the by the Board of Directors on a bi-monthly Philip E. Bourne, Ph.D., President organizing committee, local organizing com- teleconference. Motions that pass are enacted Michael Gribskov, Ph.D., mittee, and program committee for their hard by the Executive Committee which also serves Vice President work preparing for the conference.
    [Show full text]
  • Top 100 AI Leaders in Drug Discovery and Advanced Healthcare Introduction
    Top 100 AI Leaders in Drug Discovery and Advanced Healthcare www.dka.global Introduction Over the last several years, the pharmaceutical and healthcare organizations have developed a strong interest toward applying artificial intelligence (AI) in various areas, ranging from medical image analysis and elaboration of electronic health records (EHRs) to more basic research like building disease ontologies, preclinical drug discovery, and clinical trials. The demand for the ML/AI technologies, as well as for ML/AI talent, is growing in pharmaceutical and healthcare industries and driving the formation of a new interdisciplinary industry (‘data-driven healthcare’). Consequently, there is a growing number of AI-driven startups and emerging companies offering technology solutions for drug discovery and healthcare. Another important source of advanced expertise in AI for drug discovery and healthcare comes from top technology corporations (Google, Microsoft, Tencent, etc), which are increasingly focusing on applying their technological resources for tackling health-related challenges, or providing technology platforms on rent bases for conducting research analytics by life science professionals. Some of the leading pharmaceutical giants, like GSK, AstraZeneca, Pfizer and Novartis, are already making steps towards aligning their internal research workflows and development strategies to start embracing AI-driven digital transformation at scale. However, the pharmaceutical industry at large is still lagging behind in adopting AI, compared to more traditional consumer industries -- finance, retail etc. The above three main forces are driving the growth in the AI implementation in pharmaceutical and advanced healthcare research, but the overall success depends strongly on the availability of highly skilled interdisciplinary leaders, able to innovate, organize and guide in this direction.
    [Show full text]
  • Bioengineering Professor Trey Ideker Wins 2009 Overton Prize
    Bioengineering Professor Trey Ideker Wins 2009 Overton Prize March 13, 2009 Daniel Kane University of California, San Diego bioengineering professor Trey Ideker-a network and systems biology pioneer-has won the International Society for Computational Biology's Overton Prize. The Overton prize is awarded each year to an early-to-mid-career scientist who has already made a significant contribution to the field of computational biology. Trey Ideker is an Associate Professor of Bioengineering at UC San Diego's Jacobs School of Engineering, Adjunct Professor of Computer Science, and member of the Moores UCSD Cancer Center. He is a pioneer in using genome-scale measurements to construct network models of cellular processes and disease. His recent research activities include development of software and algorithms for protein network analysis, network-level comparison of pathogens, and genome-scale models of the response to DNA-damaging agents. "Receiving this award is a wonderful honor and helps to confirm that the work we have been doing for the past several years has been useful to people," said Ideker. "This award also provides great recognition to UC San Diego which has fantastic bioinformatics programs both at the undergraduate and graduate level. I could never have done it without the help of some really first-rate bioinformatics and bioengineering graduate students," said Ideker. Ideker is on the faculty of the Jacobs School of Engineering's Department of Bioengineering, which ranks 2nd in the nationfor biomedical engineering, according to the latest US News rankings. The bioengineering department has ranked among the top five programs in the nation every year for the past decade.
    [Show full text]
  • A Glance Into the Evolution of Template-Free Protein Structure Prediction Methodologies Arxiv:2002.06616V2 [Q-Bio.QM] 24 Apr 2
    A glance into the evolution of template-free protein structure prediction methodologies Surbhi Dhingra1, Ramanathan Sowdhamini2, Frédéric Cadet3,4,5, and Bernard Offmann∗1 1Université de Nantes, CNRS, UFIP, UMR6286, F-44000 Nantes, France 2Computational Approaches to Protein Science (CAPS), National Centre for Biological Sciences (NCBS), Tata Institute for Fundamental Research (TIFR), Bangalore 560-065, India 3University of Paris, BIGR—Biologie Intégrée du Globule Rouge, Inserm, UMR_S1134, Paris F-75015, France 4Laboratory of Excellence GR-Ex, Boulevard du Montparnasse, Paris F-75015, France 5DSIMB, UMR_S1134, BIGR, Inserm, Faculty of Sciences and Technology, University of La Reunion, Saint-Denis F-97715, France Abstract Prediction of protein structures using computational approaches has been explored for over two decades, paving a way for more focused research and development of algorithms in com- parative modelling, ab intio modelling and structure refinement protocols. A tremendous suc- cess has been witnessed in template-based modelling protocols, whereas strategies that involve template-free modelling still lag behind, specifically for larger proteins (> 150 a.a.). Various improvements have been observed in ab initio protein structure prediction methodologies over- time, with recent ones attributed to the usage of deep learning approaches to construct protein backbone structure from its amino acid sequence. This review highlights the major strategies undertaken for template-free modelling of protein structures while discussing few tools devel- arXiv:2002.06616v2 [q-bio.QM] 24 Apr 2020 oped under each strategy. It will also briefly comment on the progress observed in the field of ab initio modelling of proteins over the course of time as seen through the evolution of CASP platform.
    [Show full text]
  • Deep Learning-Based Advances in Protein Structure Prediction
    International Journal of Molecular Sciences Review Deep Learning-Based Advances in Protein Structure Prediction Subash C. Pakhrin 1,†, Bikash Shrestha 2,†, Badri Adhikari 2,* and Dukka B. KC 1,* 1 Department of Electrical Engineering and Computer Science, Wichita State University, Wichita, KS 67260, USA; [email protected] 2 Department of Computer Science, University of Missouri-St. Louis, St. Louis, MO 63121, USA; [email protected] * Correspondence: [email protected] (B.A.); [email protected] (D.B.K.) † Both the authors should be considered equal first authors. Abstract: Obtaining an accurate description of protein structure is a fundamental step toward under- standing the underpinning of biology. Although recent advances in experimental approaches have greatly enhanced our capabilities to experimentally determine protein structures, the gap between the number of protein sequences and known protein structures is ever increasing. Computational protein structure prediction is one of the ways to fill this gap. Recently, the protein structure prediction field has witnessed a lot of advances due to Deep Learning (DL)-based approaches as evidenced by the suc- cess of AlphaFold2 in the most recent Critical Assessment of protein Structure Prediction (CASP14). In this article, we highlight important milestones and progresses in the field of protein structure prediction due to DL-based methods as observed in CASP experiments. We describe advances in various steps of protein structure prediction pipeline viz. protein contact map prediction, protein distogram prediction, protein real-valued distance prediction, and Quality Assessment/refinement. We also highlight some end-to-end DL-based approaches for protein structure prediction approaches. Additionally, as there have been some recent DL-based advances in protein structure determination Citation: Pakhrin, S.C.; Shrestha, B.; using Cryo-Electron (Cryo-EM) microscopy based, we also highlight some of the important progress Adhikari, B.; KC, D.B.
    [Show full text]
  • IG-VAE: Generative Modeling of Immunoglobulin Proteins by Direct
    bioRxiv preprint doi: https://doi.org/10.1101/2020.08.07.242347; this version posted August 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. IG-VAE: GENERATIVE MODELING OF IMMUNOGLOBULIN PROTEINS BY DIRECT 3D COORDINATE GENERATION Raphael R. Eguchi Namrata Anand Christian A. Choe Department of Biochemistry Department of Bioengineering Department of Bioengineering Stanford University Stanford University Stanford University [email protected] [email protected] [email protected] Po-Ssu Huang Department of Bioengineering Stanford University [email protected] ABSTRACT While deep learning models have seen increasing applications in protein science, few have been implemented for protein backbone generation—an important task in structure-based problems such as active site and interface design. We present a new approach to building class-specific backbones, using a variational auto-encoder to directly generate the 3D coordinates of immunoglobulins. Our model is torsion- and distance-aware, learns a high-resolution embedding of the dataset, and generates novel, high-quality structures compatible with existing design tools. We show that the Ig-VAE can be used to create a computational model of a SARS-CoV2-RBD binder via latent space sampling. We further demonstrate that the model’s generative prior is a powerful tool for guiding computational protein design, motivating a new paradigm under which backbone design is solved as constrained optimization problem in the latent space of a generative model. 1 Introduction Over the past two decades, the field of protein design has made steady progress, with new computational methods providing novel solutions to challenging problems such as enzyme catalysis, viral inhibition, de novo structure generation and more[1, 2, 3, 4, 5, 6, 7, 8, 9].
    [Show full text]