Meeting Report from the Bioinformatics & Computational

Total Page:16

File Type:pdf, Size:1020Kb

Meeting Report from the Bioinformatics & Computational Genome Canada Meeting Report from the Bioinformatics & Computational Biology Workshop Toronto, Ontario, Canada – December 5 & 6 2011 This workshop was made possible with the generous support of our sponsors: Genome Canada 2011 Bioinformatics and Computational Biology Workshop ______________________________________________________________________________ The wordle above was created by Guillaume Bourque using the text of this report. It is meant to illustrate the kind of data mining approach that is relevant to bioinformatics. 2 Genome Canada 2011 Bioinformatics and Computational Biology Workshop Table of Contents Executive Summary ......................................................................................................................... 4 Background ..................................................................................................................................... 5 Process ............................................................................................................................................ 6 Presentations .................................................................................................................................. 7 Theme Breakout Groups and Discussion ...................................................................................... 10 Strategy Session ............................................................................................................................ 14 Recommendations ........................................................................................................................ 14 Next Steps ..................................................................................................................................... 16 Appendices Appendix 1 – Workshop Program Appendix 2 – Final List of Participants Appendix 3 – Speaker Biographies Organization Abbreviations CANARIE Canada’s Advanced Research and Innovation Network (www.canarie.ca) CFI Canada Foundation for Innovation (www.innovation.ca) CIHR Canadian Institutes of Health Research (www.cihr.ca) EMBL European Molecular Biology Laboratory (www.embl.org) GC Genome Canada (www.genomecanada.ca) MITACS Mathematics of Information Technology and Complex Systems (www.mitacs.ca) NRC National Research Council (www.nrc-cnrc.gc.ca) OICR Ontario Institute for Cancer Research (www.oicr.on.ca) 3 Genome Canada 2011 Bioinformatics and Computational Biology Workshop Executive Summary On December 5 & 6, 2011, a workshop was held to bring together bioinformaticians and computational biologists, along with researchers from other related disciplines such as biologists, mathematicians, statisticians, application developers, informatics specialists, data visualisation experts and machine learning specialists. This workshop was convened with a view to deriving input from a broad spectrum of stakeholder communities, as a first step in the creation of a multi-year road map for bioinformatics and computational biology in Canada. Led by a selected panel of presenters, participants in the workshop were charged with commenting upon opportunities in informatics-related genome research. The principal recommendations from the workshop are (in priority order): Funding Genome Canada should take a lead-role in coordinating the development of a significant, national, multi-year funding program directed to boinformatics/computational biology. Networking Mechanisms should be established to improve coordination and promote interdisciplinary collaborations within the bioinformatics/computational biology community. Integration The Canadian bioinformatics community should develop and use data standards and best practices as necessary elements for data integration and modelling. High Quality Personnel Programs should be developed to attract, retain and train innovative individuals in the areas of bioinformatics, computational biology, and bio-statistics, who have an interest in working in the life sciences. High Performance Computing A coordinated and well-managed high-performance computing infrastructure that is targeted for life sciences should be supported. Algorithm and Software Development Algorithms and software must be developed with the end user in mind and based on established best practices. Policies The community should work closely with Genome Canada and Government agencies to ensure appropriate policies and legislation are in place to realize the full potential of Canada’s bio-economy. 4 Genome Canada 2011 Bioinformatics and Computational Biology Workshop Background Genome Canada’s Science and Industry Advisory Committee (SIAC) identified the need to advance the area of bioinformatics/computational biology in Canada. The 2011 cross-Canada consultations in connection with Genome Canada’s strategic plan also highlighted the importance of a national effort to address the needs in this area. Therefore, SIAC undertook the planning for a bioinformatics/ computational biology workshop scheduled for the fall of 2011. For many participants, this workshop was a singular opportunity to meet with bioinformaticians, computational biologists and colleagues from other related disciplines. A decade ago, in September 2001, Genome Canada and the Canadian Institutes of Health Research held a jointly sponsored workshop on bioinformatics. At that time, bioinformatics expertise and technology in Canada were just emerging and were variable across the country. Since the 2001 meeting, Genome Canada continued to encourage research in this area, mostly through its Science & Technology Innovation Centres (STICs) and funding competitions focused on technology development. A ten-member steering committee was struck to organize the workshop. Chair: William (Bill) Crosby, SIAC Member Professor of Biological Sciences University of Windsor Members: Guillaume Bourque Francis Ouellette Director of Bioinformatics Associate Director, Informatics and Bio-computing McGill University and Genome Quebec Ontario Institute for Cancer Research Innovation Centre Associate Professor, Cell and Systems Biology University of Toronto Mark Daley Gijs van Rooijen Departments of Computer Science and Chief Scientific Officer Biology, University of Western Ontario Genome Alberta Stacey Gabriel, SIAC Member George Weinstock, GC Board of Directors Director, Genetic Analysis Platform Program Associate Director, The Genome Center Co-Director, Genome Sequence Analysis Prog. Washington University School of Medicine Co-Director, Program in Medical and Population Genetics, Broad Institute Michael Hallett, Advisor John Yates III, SIAC Member Director Department of Cell Biology McGill Centre for Bioinformatics Scripps Research Institute Steven Jones Jacques Simard, SIAC Chair, Committee Observer Associate Director and Head, Bioinformatics Canada Research Chair in Oncogenetics Genome Sciences Centre Director, Endocrinology and Genomics Axis, CHUQ British Columbia Cancer Research Centre Research Centre & Dept. Molecular Medicine, Laval 5 Genome Canada 2011 Bioinformatics and Computational Biology Workshop The Bioinformatics and Computational Biology Workshop was held on December 5 & 6, 2011 in Toronto. Sponsors included the Canadian Institutes of Health Research – Institute of Genetics and Institute for Cancer Research, and IBM Canada. Workshop Objectives Existing tools and approaches have only partially realized the information potential in existing data sets. A pan-national initiative in bioinformatics/computational biology will substantially and positively impact the life science economy in Canada, with benefits in human health, as well as non-health sectors, such as, agriculture, environment, fisheries, and forestry. An emphasis of the workshop was to assemble an interdisciplinary group of biologists, mathematicians, statisticians, application developers, informatics specialists, data visualisation experts, machine learning specialists, and computational scientists dedicated to developing novel approaches to deriving value from genomics-related data, creating user- friendly interfaces, and establishing rich learning environments for the training and development of highly qualified personnel required for this critical aspect of research. The importance of and need for infrastructure was also to be considered. An international dimension to the initiative is expected and encouraged. The specific goals of the workshop were two-fold: To inform Genome Canada during its development of a request for applications set to be launched in 2012. To inform the development of a multi-year roadmap detailing the current state-of-the-art and future challenges and opportunities in bioinformatics. Process The Workshop Steering Committee chose to divide the subject matter into seven themes. An expert speaker for each theme was asked to present to participants an overview of the outstanding issues for the theme and to list the roadblocks, challenges and opportunities. Theme Speaker Title of Talk 1 Information Theory and Lila Kari, University of The Many Facets of Natural Computing Biological Computing Western Ontario 2 Network and Pathway Gary Bader, University of Network and Pathway Analysis – Moving Analysis Toronto Towards Applications 3 Ecology and Evolution Magnus Nordborg, Genomic Approaches to Understanding Gregor Mendel Institut Adaptation 4 Proteomics and Analysis of Andrew Emili, University Deriving Knowledge from Proteomic Data Data Sets of Toronto 5 Clinical Applications John McPherson, OICR The Rise of Personalized Medicine in Cancer: Implications and Challenges
Recommended publications
  • Proquest Dissertations
    Automated learning of protein involvement in pathogenesis using integrated queries Eithon Cadag A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Washington 2009 Program Authorized to Offer Degree: Department of Medical Education and Biomedical Informatics UMI Number: 3394276 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion. UMI Dissertation Publishing UMI 3394276 Copyright 2010 by ProQuest LLC. All rights reserved. This edition of the work is protected against unauthorized copying under Title 17, United States Code. uest ProQuest LLC 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, Ml 48106-1346 University of Washington Graduate School This is to certify that I have examined this copy of a doctoral dissertation by Eithon Cadag and have found that it is complete and satisfactory in all respects, and that any and all revisions required by the final examining committee have been made. Chair of the Supervisory Committee: Reading Committee: (SjLt KJ. £U*t~ Peter Tgffczy-Hornoch In presenting this dissertation in partial fulfillment of the requirements for the doctoral degree at the University of Washington, I agree that the Library shall make its copies freely available for inspection. I further agree that extensive copying of this dissertation is allowable only for scholarly purposes, consistent with "fair use" as prescribed in the U.S.
    [Show full text]
  • Download Flyer
    COLD SPRING HARBOR ASIA ORGANIZERS(Speaker, Affiliation, Country/Region) Steven E. Brenner Frontiers in University of California, Berkeley, USA A Keith Dunker Indiana University School of Medicine, USA Computational Biology Julian Gough MRC Laboratory of Molecular Biology, UK Suzhou, China September 3-7, 2018 Luhua Lai & Bioinformatics Peking University, China Yunlong Liu Abstract deadline: July 13, 2018 Indiana University School of Medicine, USA MAJOR TOPICS Precision medicine, human genome variation, disease & diagnosis Molecular evolution Pathways, networks & developmental biology Molecular structure, with pioneering techniques Molecular machines, their functions & dynamics Intrinsically disordered proteins & their functions RNA function, regulation & splicing 3D genomics & regulatory inferences Single cell analysis KEYNOTE SPEAKERS (Speaker, Affiliation, Country/Region) Nancy Cox, Vanderbilt University, USA Yoshihide Hayashizaki, RIKEN Research Cluster for Innovation, JAPAN INVITED SPEAKERS (Speaker, Affiliation, Country/Region) Russ Altman, Stanford University, USA Manolis Kellis, MIT Computer Science and Broad Institute, USA Lukasz Kurgan, Virginia Commonwealth University, USA Peer Bork, European Molecular Biology Laboratory, GERMANY Luhua Lai, Peking University, CHINA Steven Brenner, University of California, Berkeley, USA Michal Linial, The Hebrew University of Jerusalem, ISRAEL Angela Brooks, University of California, Santa Cruz, USA Yunlong Liu, Indiana University School of Medicine, USA Luonan Chen, Shanghai Institutes for
    [Show full text]
  • Transformer Neural Networks for Protein Prediction Tasks
    bioRxiv preprint doi: https://doi.org/10.1101/2020.06.15.153643; this version posted June 16, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. TRANSFORMING THE LANGUAGE OF LIFE:TRANSFORMER NEURAL NETWORKS FOR PROTEIN PREDICTION TASKS Ananthan Nambiar ∗ Maeve Heflin∗ Department of Bioengineering Department of Computer Science Carl R. Woese Inst. for Genomic Biol. Carl R. Woese Inst. for Genomic Biol. University of Illinois at Urbana-Champaign University of Illinois at Urbana-Champaign Urbana, IL 61801 Urbana, IL 61801 [email protected] Simon Liu∗ Sergei Maslov Department of Computer Science Department Bioengineering Carl R. Woese Inst. for Genomic Biol. Department of Physics University of Illinois at Urbana-Champaign Carl R. Woese Inst. for Genomic Biol. Urbana, IL 61801 University of Illinois at Urbana-Champaign Urbana, IL 61801 Mark Hopkinsy Anna Ritzy Department of Computer Science Department of Biology Reed College Reed College Portland, OR 97202 Portland, OR 97202 June 16, 2020 ABSTRACT The scientific community is rapidly generating protein sequence information, but only a fraction of these proteins can be experimentally characterized. While promising deep learning approaches for protein prediction tasks have emerged, they have computational limitations or are designed to solve a specific task. We present a Transformer neural network that pre-trains task-agnostic sequence representations. This model is fine-tuned to solve two different protein prediction tasks: protein family classification and protein interaction prediction.
    [Show full text]
  • BIOINFORMATICS Pages 48–64
    Vol. 16 no. 1 2000 BIOINFORMATICS Pages 48–64 Serendipity in bioinformatics, the tribulations of a Swiss bioinformatician through exciting times! Amos Bairoch Swiss Institute of Bioinformatics, Centre Medical Universitaire, 1 rue Michel Servet, 1211 Geneva 4, Switzerland; E-mail: [email protected] Introduction write programs to analyze the results of this type of exper- This is a personal recollection of the events that led iment. When I returned to Geneva, I contacted the Clinical me to develop software tools and databases in the con- Biochemistry Institute (IBC) of the University of Geneva text of what has recently been termed proteomatics which, I thought, might be interested in these programs. (bioinformatics in the context of proteomics). As will I was very lucky! The day I visited the IBC I was in- be manifest from this article, the creations of PC/Gene, troduced to a visiting professor from Oxford, Robin Of- SWISS-PROT, PROSITE and ExPASy, were mostly ford. Robin Offord, a biochemist and previously a nuclear serendipitous unplanned events. From the very beginning physicist, is an expert in the semi-synthesis of proteins. of my biochemistry studies in 1978 up to today, I was ex- In addition Robin has a long lasting interest and knowl- tremely lucky to be able to pursue my combined interests edge of computer systems that started with Mercury Au- in proteins and computer analysis and to be able to follow tocode in 1959. Robin convinced the head of the IBC, Al- new avenues when they opened up. I also feel privileged bert Renold, to hire me part-time to write data analysis to have met and collaborated with many researchers programs.
    [Show full text]
  • Simulation & Experiment Learning from Kinases in Cancer
    University of Pennsylvania ScholarlyCommons Publicly Accessible Penn Dissertations 2017 Simulation & Experiment Learning From Kinases In Cancer E. Joseph Jordan University of Pennsylvania, [email protected] Follow this and additional works at: https://repository.upenn.edu/edissertations Part of the Bioinformatics Commons, and the Biophysics Commons Recommended Citation Jordan, E. Joseph, "Simulation & Experiment Learning From Kinases In Cancer" (2017). Publicly Accessible Penn Dissertations. 2680. https://repository.upenn.edu/edissertations/2680 This paper is posted at ScholarlyCommons. https://repository.upenn.edu/edissertations/2680 For more information, please contact [email protected]. Simulation & Experiment Learning From Kinases In Cancer Abstract The decreasing cost of genome sequencing technology has lead to an explosion of informa- tion about which mutations are frequently observed in cancer, demonstrating an important role in cancer progression for kinase domain mutations. Many therapies have been devel- oped that target mutations in kinase proteins that lead to constitutive activation. However, a growing body of evidence points to the serious dangers of many kinase ATP competitive inhibitors leading to paradoxical activation in non-constitutively active proteins. The large number of observed mutations and the critical need to only treat patients harboring activat- ing mutations with targeted therapies raises the question of how to classify the thousands of mutations that have been observed. We start with an in depth look at the state of knowl- edge of the distribution and effects of kinase mutations. We then report on computational methods to understand and predict the effects of kinase domain mutations. Using molecular dynamics simulations of mutant kinases, we show that there is a switch-like network of la- bile hydrogen bonds that are often perturbed in activating mutations.
    [Show full text]
  • Bioinformatics Methods Exam Project: Automated Function Prediction by Network-Based Protein Ranking
    Bioinformatics Methods exam project: Automated function prediction by network-based protein ranking. January 15, 2017 The goal of this project is the prediction of protein function by ranking proteins with respect to Gene Ontology (GO) terms [13], using network-based methods implemented in the RANKS R package [15], downloadable from CRAN (https://cran.r-project.org). 1 Data Three networks representing the functional similarity between proteins are avail- able: 1. The DanXen network encompasses Danio rerio (zebrafish) and Xenopus laevis (a small austral frog) proteins. 2. The SacPomDis network includes Saccharomyces cerevisiae, Schizosaccha- romyces pombe and Dictyostelium discoideum (unicellular eukaryotes). 3. The (Dros) network is reserved to Drosophila melanogaster (fruit-fly), a model organism for insects. Each network is constructed by integrating 8 different sources of information from public databases (Table 1). As class labels (groundtruth) for the proteins included in the integrated network the Gene Ontology BP, MF and CC experimental annotations extracted from the Swissprot database have been used (http://www.expasy.org/). The number of the terms (classes) in the three networks varies from 184 to 919 (CC - Cellular component), from 358 to 2195 (MF - Molecular Function), and from 2281 to 5037 (BP - Biological Processes). The number of nodes (proteins) in the DanXen, Dros and SacPomDis networks is respectively 6250, 3195 and 15836. Availability of the data. All the data (networks and annotations) are down- loadable from: http://homes.di.unimi.it/DMB1617.
    [Show full text]