Introduction to Bioinformatics Using Action Labs

BIOINFORMATICS Action Labs Jean-Louis Lassez, Ryan Rossi, and Stephen Sheel 2 | BINF Action Labs Preface Bioinformatics is the application of computational techniques and tools to analyze and manage biological data. This book provides an introduction to bioinformatics through the use of Action Labs. These labs allow students to get experience using real data and tools to solve difficult problems. The book comes with supplementary software tools and papers. The labs use data from Breast Cancer, Liver Disease, Diabetes, SARS, HIV, Extinct Organisms, and many others. The book has been written for first or second year computer science, mathematics, and biology students. The supplementary software and papers can be found at http://www.kibazen.com/binf Jean-Louis Lassez: “Life is Pachinko” at the Kinsey Institute Museum 3 | BINF Table of Contents Chapter 1. Introduction to Bioinformatics........................................................................6 What is Bioinformatics................................................................................................................................................. 7 Exploring Frameshifts .................................................................................................................................................. 9 Bioinformatics Tools.................................................................................................................................................. 11 Chapter 2. Introduction to BLAST and FASTA.............................................................13 Introduction to Sequence Analysis............................................................................................................................. 14 Exploring BLAST #1 ................................................................................................................................................. 17 Exploring BLAST #2 ................................................................................................................................................. 19 Exploring FASTA ...................................................................................................................................................... 21 Chapter 3. BLAST Analysis and Applications...............................................................23 Understanding the BLAST Programs......................................................................................................................... 24 Sequence Information Searching................................................................................................................................ 26 Genes and Diseases .................................................................................................................................................... 28 Protein Database Searching........................................................................................................................................ 30 Jurassic Park............................................................................................................................................................... 32 HIV Patient Research ................................................................................................................................................. 34 SARS Exploration...................................................................................................................................................... 36 Extinct Organism Research........................................................................................................................................ 38 Cancer BLAST Experiment ....................................................................................................................................... 40 Scoring Matrices ........................................................................................................................................................ 42 Chapter 4. Advanced Bioinformatics Tools....................................................................45 Introduction to CLUSTAL......................................................................................................................................... 46 Introduction to PSI-BLAST ....................................................................................................................................... 49 Translation Initiation Sites ......................................................................................................................................... 51 Protein Structure......................................................................................................................................................... 53 Protein Function ......................................................................................................................................................... 55 Chapter 5. Classification and Pattern Recognition.......................................................57 K-means Clustering.................................................................................................................................................... 58 Hierarchical Clustering............................................................................................................................................... 60 Introduction to Maxim Toy and Data Modifier.......................................................................................................... 62 4 | BINF Action Labs Introduction to Universal Frame Finder..................................................................................................................... 69 Maxim Parameter Project........................................................................................................................................... 74 Breast Cancer Analysis .............................................................................................................................................. 77 Liver Disease Analysis............................................................................................................................................... 80 Hepatitis Analysis ...................................................................................................................................................... 82 Chapter 6. Advanced Topics..............................................................................................84 HMM and Protein Sequence Analysis ....................................................................................................................... 85 Introduction to Pfam................................................................................................................................................... 87 Singular Value Decomposition and Latent Semantic Analysis.................................................................................. 89 Appendix. Supplementary Papers ...................................................................................98 Crick’s Hypothesis Revisited: The Existence of a Universal Coding Frame............................................................. 99 Automated Discovery of Translation Initiation Sites and Promoter Sequences in Bacterial Genomes................... 106 Glossary ...............................................................................................................................121 Index.....................................................................................................................................126 The software and supplementary papers are located at http://www.kibazen.com/binf 5 | BINF Chapter 1 Introduction to Bioinformatics 6 | BINF Introduction to Bioinformatics What is Bioinformatics Background: What is Bioinformatics? It depends on who you are talking to. A geneticist, a biologist, a mathematician, a CEO of a pharmaceutical company and a computer scientist all would have related, but different, opinions as to what Bioinformatics is. Purpose: This lab introduces various aspects of Bioinformatics, its scientific basis, its techniques and its applications. Resources: There are many excellent resources on Bioinformatics that can be found on the web. Visit, for instance, the tutorial located at: http://www.ebi.ac.uk/2can/bioinformatics/bioinf_what_1.html Key Terms: • Genome • Codon • Gene • Prokaryote • Protein • Eukaryote • Amino Acid • Archaea • DNA • RNA Directions: Read the tutorial in the resources (or equivalent) thoroughly. Exercises: 1. Give a concise, yet precise, definition of Bioinformatics. 2. What are the biggest challenges facing Bioinformatics? Why do you think this is the case? 3. Give a list of the main biological databases that can be accessed on the internet. 4. What are the differences in the functions of the various biological databases? 5. Name the categories of the major data analysis tool. 7 | BINF 6. How are the sequence analysis tools used in Bioinformatics? 7. Make a list of the most important real-world applications for Bioinformatics. Rank your choices from 1-10 and justify why, in your view, the application received its ranking (As the ranking is subjective and tied to your taste or expertise, what matters most is not the ranking you choose but the justifications you give). References: 1. European Molecular Biology Laboratory (EMBL).”What is Bioinformatics?”. <http://www.ebi.ac.uk/2can/bioinformatics/bioinf_what_1.html>. 2. Genetic Home

Introduction to Bioinformatics Using Action Labs

Toolboxes for a Standardised and Systematic Study of Glycans

Life with 6000 Genes Author(S): A. Goffeau, B. G. Barrell, H. Bussey, R

Comprehensive Analysis of CRISPR/Cas9-Mediated Mutagenesis in Arabidopsis Thaliana by Genome-Wide Sequencing

Introduction to Bioinformatics (Elective) – SBB1609

Embnet Conference 2020 Programme

Concepts, Historical Milestones and the Central Place of Bioinformatics in Modern Biology: a European Perspective

The Uniprot Knowledgebase BLAST

Integrative Analysis of Transcriptomic Data for Identification of T-Cell

Bioinformatics: a Practical Guide to the Analysis of Genes and Proteins, Second Edition Andreas D

Embnet.News Volume 4 Nr

Biomolecule and Bioentity Interaction Databases in Systems Biology: a Comprehensive Review

Embnet.News Volume 5 Nr