Developing Bioinformatics Computer Skills
Total Page:16
File Type:pdf, Size:1020Kb
Developing Bioinformatics Computer Skills Cynthia Gibas Per Jambeck Publisher: O'Reilly First Edition April 2001 ISBN: 1-56592-664-1, 446 pages Developing Bioinformatics Computer Skills Copyright © 2001 O'Reilly & Associates, Inc. All rights reserved. Printed in the United States of America. Published by O'Reilly & Associates, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O'Reilly & Associates books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://safari.oreilly.com). For more information contact our corporate/institutional sales department: 800-998-9938 or [email protected]. The O'Reilly logo is a registered trademark of O'Reilly & Associates, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O'Reilly & Associates, Inc. was aware of a trademark claim, the designations have been printed in caps or initial caps. The association between the image of a Caenorhabditis elegans and the topic of bioinformatics is a trademark of O'Reilly & Associates, Inc. While every precaution has been taken in the preparation of this book, the publisher assumes no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein. 2 Preface__________________________________________________________________________________ 6 Audience for This Book _________________________________________________________________ 6 Structure of This Book_____________________________________________________________________ 7 Our Approach to Bioinformatics ______________________________________________________________ 9 URLs Referenced in This Book_______________________________________________________________ 9 Conventions Used in This Book __________________________________________________________ 9 Comments and Questions _______________________________________________________________ 9 Acknowledgments _______________________________________________________________________ 10 Chapter 1. Biology in the Computer Age ____________________________________________________ 11 1.1 How Is Computing Changing Biology? ________________________________________________ 11 1.2 Isn't Bioinformatics Just About Building Databases?____________________________________ 15 1.3 What Does Informatics Mean to Biologists?__________________________________________________ 18 1.4 What Challenges Does Biology Offer Computer Scientists? ______________________________________ 18 1.5 What Skills Should a Bioinformatician Have? ________________________________________________ 19 1.6 Why Should Biologists Use Computers? ____________________________________________________ 20 1.7 How Can I Configure a PC to Do Bioinformatics Research? ______________________________ 21 1.8 What Information and Software Are Available? _______________________________________________ 22 1.9 Can I Learn a Programming Language Without Classes? ________________________________________ 23 1.10 How Can I Use Web Information?________________________________________________________ 23 1.11 How Do I Understand Sequence Alignment Data? ____________________________________________ 24 1.12 How Do I Write a Program to Align Two Biological Sequences? _________________________________ 24 1.13 How Do I Predict Protein Structure from Sequence?___________________________________________ 24 1.14 What Questions Can Bioinformatics Answer? _______________________________________________ 24 Chapter 2. Computational Approaches to Biological Questions _________________________________ 26 2.1 Molecular Biology's Central Dogma ___________________________________________________ 26 2.2 What Biologists Model ______________________________________________________________ 30 2.3 Why Biologists Model _________________________________________________________________ 33 2.4 Computational Methods Covered in This Book _________________________________________ 34 2.5 A Computational Biology Experiment______________________________________________________ 38 Chapter 3. Setting Up Your Workstation____________________________________________________ 44 3.1 Working on a Unix System______________________________________________________________ 44 3.2 Setting Up a Linux Workstation __________________________________________________________ 46 3.3 How to Get Software Working ________________________________________________________ 51 3.4 What Software Is Needed? ______________________________________________________________ 57 Chapter 4. Files and Directories in Unix_____________________________________________________ 58 4.1 Filesystem Basics __________________________________________________________________ 58 4.2 Commands for Working with Directories and Files ______________________________________ 63 4.3 Working in a Multiuser Environment __________________________________________________ 70 Chapter 5. Working on a Unix System ______________________________________________________ 78 5.1 The Unix Shell _______________________________________________________________________ 78 5.2 Issuing Commands on a Unix System_________________________________________________ 79 5.3 Viewing and Editing Files____________________________________________________________ 84 5.4 Transformations and Filters _________________________________________________________ 90 5.5 File Statistics and Comparisons______________________________________________________ 97 5.6 The Language of Regular Expressions ________________________________________________ 99 5.7 Unix Shell Scripts____________________________________________________________________ 102 5.8 Communicating with Other Computers _______________________________________________ 103 5.9 Playing Nicely with Others in a Shared Environment ___________________________________ 108 Chapter 6. Biological Research on the Web _________________________________________________120 6.1 Using Search Engines_________________________________________________________________ 120 6.2 Finding Scientific Articles __________________________________________________________ 122 6.3 The Public Biological Databases ____________________________________________________ 126 3 6.4 Searching Biological Databases_____________________________________________________ 131 6.5 Depositing Data into the Public Databases__________________________________________________ 138 6.6 Finding Software ____________________________________________________________________ 138 6.7 Judging the Quality of Information _______________________________________________________ 139 Chapter 7. Sequence Analysis, Pairwise Alignment, and Database Searching ____________________142 7.1 Chemical Composition of Biomolecules ___________________________________________________ 143 7.2 Composition of DNA and RNA ______________________________________________________ 143 7.3 Watson and Crick Solve the Structure of DNA _________________________________________ 144 7.4 Development of DNA Sequencing Methods ___________________________________________ 146 7.5 Genefinders and Feature Detection in DNA _________________________________________________ 149 7.6 DNA Translation __________________________________________________________________ 151 7.7 Pairwise Sequence Comparison_____________________________________________________ 152 7.8 Sequence Queries Against Biological Databases ______________________________________ 160 7.9 Multifunctional Tools for Sequence Analysis ________________________________________________ 167 Chapter 8. Multiple Sequence Alignments, Trees, and Profiles ________________________________169 8.1 The Morphological to the Molecular ______________________________________________________ 169 8.2 Multiple Sequence Alignment _______________________________________________________ 170 8.3 Phylogenetic Analysis _____________________________________________________________ 175 8.4 Profiles and Motifs ________________________________________________________________ 180 Chapter 9. Visualizing Protein Structures and Computing Structural Properties _________________189 9.1 A Word About Protein Structure Data _____________________________________________________ 189 9.2 The Chemistry of Proteins __________________________________________________________ 190 9.3 Web-Based Protein Structure Tools__________________________________________________ 201 9.4 Structure Visualization _____________________________________________________________ 202 9.5 Structure Classification ____________________________________________________________ 210 9.6 Structural Alignment_______________________________________________________________ 215 9.7 Structure Analysis ___________________________________________________________________ 218 9.8 Solvent Accessibility and Interactions________________________________________________ 221 9.9 Computing Physicochemical Properties ____________________________________________________ 224 9.10 Structure Optimization ____________________________________________________________ 226 9.11 Protein Resource Databases____________________________________________________________ 229 9.12 Putting It All Together_____________________________________________________________ 230 Chapter 10. Predicting Protein Structure and Function from Sequence _________________________232 10.1 Determining the Structures of Proteins ______________________________________________ 232 10.2 Predicting the Structures of Proteins _____________________________________________________