Bioinformatics
Total Page:16
File Type:pdf, Size:1020Kb
BookID <BID>_ChapID <CID>_Proof# 1 - 29/08/2009 Bioinformatics BookID <BID>_ChapID <CID>_Proof# 1 - 29/08/2009 David Edwards ● Jason Stajich ● David Hansen Editors Bioinformatics Tools and Applications BookID <BID>_ChapID <CID>_Proof# 1 - 29/08/2009 BookID <BID>_ChapID <CID>_Proof# 1 - 29/08/2009 Editors David Edwards David Hansen Australian Centre for Plant Functional Genomics Australian E-Health Research Centre Institute for Molecular Biosciences CSIRO and School of Land Qld 4027, Brisbane, Australia Crop and Food Sciences University of Queensland Brisbane, QLD 4072 Australia Jason Stajich Department of Plant Pathology and Microbiology University of California Berkeley, CA USA ISBN 978-0-387-92737-4 e-ISBN 978-0-387-92738-1 DOI 10.1007/978-0-387-92738-1 Springer New York Dordrecht Heidelberg London Library of Congress Control Number: 2009927717 © Springer Science+Business Media, LLC 2009 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com) BookID <BID>_ChapID <CID>_Proof# 1 - 29/08/2009 Preface Biology has progressed tremendously in the last decade due in part to the increased automation in the generation of data from sequences to genotypes to phenotypes. Biology is now very much an information science, and bioinformatics provides the means to connect biological data to hypotheses. Within this volume, we have collated chapters describing various areas of applied bioinformatics, from the analysis of sequence, literature, and functional data to the function and evolution of organisms. The ability to process and interpret large volumes of data is essential with the application of new high throughput DNA sequencers providing an overload of sequence data. Initial chapters provide an introduction to the analysis of DNA and protein sequences, from motif detection to gene prediction and annotation, with specific chapters on DNA and protein databases as well as data visualization. Additional chapters focus on gene expression analysis from the perspective of traditional microarrays and more recent sequence-based approaches, followed by an introduction to the evolving field of phenomics, with specific chapters detailing advances in plant and microbial phenome analysis and a chapter dealing with the important issue of standards for functional genomics. Further chapters present the area of literature databases and associated mining tools which are becoming increasingly essential to interpret the vast volume of published biological information, while the final chapters present bioinformatics purely from a developer’s point of view, describing the various data and databases as well as common programming languages used for bioinformatics applications. These chapters provide an introduction and motivation to further avenues for implementation. Together, this volume aims to provide a resource for biology students wanting a greater understanding of the encroaching area of bioinformatics, as well as computer scientists who are interested learning more about the field of applied bioinformatics. Brisbane, QLD David Edwards Berkeley, CA Jason E. Stajich Brisbane, QLD David Hansen v BookID <BID>_ChapID <CID>_Proof# 1 - 29/08/2009 Contents 1 DNA Sequence Databases ....................................................................... 1 David Edwards, David Hansen, and Jason E. Stajich 2 Sequence Comparison Tools .................................................................. 13 Michael Imelfort 3 Genome Browsers ................................................................................... 39 Sheldon McKay and Scott Cain 4 Predicting Non-coding RNA Transcripts .............................................. 65 Laura A. Kavanaugh and Uwe Ohler 5 Gene Prediction Methods ....................................................................... 99 William H. Majoros, Ian Korf, and Uwe Ohler 6 Gene Annotation Methods ..................................................................... 121 Laurens Wilming and Jennifer Harrow 7 Regulatory Motif Analysis ..................................................................... 137 Alan Moses and Saurabh Sinha 8 Molecular Marker Discovery and Genetic Map Visualisation ........... 165 Chris Duran, David Edwards, and Jacqueline Batley 9 Sequence Based Gene Expression Analysis .......................................... 191 Lakshmi K. Matukumalli and Steven G. Schroeder 10 Protein Sequence Databases ................................................................... 209 Terry Clark vii BookID <BID>_ChapID <CID>_Proof# 1 - 29/08/2009 BookID <BID>_ChapID <CID>_Proof# 1 - 29/08/2009 viii Contents 11 Protein Structure Prediction .................................................................. 225 Sitao Wu and Yang Zhang 12 Classification of Information About Proteins ....................................... 243 Amandeep S. Sidhu, Matthew I. Bellgard, and Tharam S. Dillon 13 High-Throughput Plant Phenotyping – Data Acquisition, Transformation, and Analysis ................................................................ 259 Matthias Eberius and José Lima-Guerra 14 Phenome Analysis of Microorganisms .................................................. 279 Christopher M. Gowen and Stephen S. Fong 15 Standards for Functional Genomics ...................................................... 293 Stephen A. Chervitz, Helen Parkinson, Jennifer M. Fostel, Helen C. Causton, Susanna-Assunta Sanson, Eric W. Deutsch, Dawn Field, Chris F. Taylor, Philippe Rocca-Serra, Joe White, and Christian J. Stoeckert 16 Literature Databases............................................................................... 331 J. Lynn Fink 17 Advanced Literature-Mining Tools ....................................................... 347 Pierre Zweigenbaum and Dina Demner-Fushman 18 Data and Databases ................................................................................. 381 Daniel Damian 19 Programming Languages ....................................................................... 403 John Boyle Index ................................................................................................................. 441 BookID <BID>_ChapID <CID>_Proof# 1 - 29/08/2009 Contributors Jacqueline Batley Australian Centre for Plant Functional Genomics, Centre of Excellence for Integrative Legume Research, School of Land, Crop and Food Sciences, University of Queensland, Brisbane, QLD 4072, Australia [email protected] John Boyle The Institute for Systems Biology, 1441 North 34th Street, Seattle, WA 98105, USA [email protected] Matthew Belgard Centre for Comparative Genomics, Murdoch University, Perth, WA, Australia [email protected] Scott Cain Ontario Institute for Cancer Research, 101 College Street, Suite 800, Toronto, ON, Canada M5G0A3 [email protected] Helen C. Causton MRC Clinical Sciences Centre, Imperial College London, Hammersmith Hospital Campus, Du Cane Road, London W12 0NN, UK [email protected] Stephen A. Chervitz Affymetrix Inc., Santa Clara, CA 95051, USA [email protected] Terry Clark Australian Centre for Plant Functional Genomics, Institute for Molecular Biosciences and School of Land, Crop and Food Sciences, University of Queensland, Brisbane, QLD 4072, Australia [email protected] Daniel Damian Biowisdom Ltd., CB 22 7GG, Cambridge, UK [email protected] Dina Demner-Fushman Communications Engineering Branch, Lister Hill National Center for Biomedical Communications, US National Library of Medicine, Bethesda, MD, USA [email protected] ix BookID <BID>_ChapID <CID>_Proof# 1 - 29/08/2009 BookID <BID>_ChapID <CID>_Proof# 1 - 29/08/2009 x Contributors Eric W. Deutsch The Institute for Systems Biology, Seattle, WA 98105, USA [email protected] Tharram Dillon Digital Ecosystems and Business Intelligence Institute, Curtin University of Technology, Perth, WA, Australia [email protected] Chris Duran Australian Centre for Plant Functional Genomics, School of Land, Crop and Food Sciences, University of Queensland, Brisbane, QLD 4072, Australia [email protected] Matthias Eberius LemnaTec GmbH, Schumanstr. 1a, 52146 Wuerselen, Germany [email protected] David Edwards Australian Centre for Plant Functional Genomics, Institute for Molecular Biosciences and School of land, Crop and Food Sciences, University of Queensland, Brisbane, QLD 4072, Australia [email protected] Dawn Field Natural Environmental Research Council, Centre for Ecology and Hydrology, Oxford, OX1 3SR, UK [email protected] J. Lynn Fink Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, CA, USA [email protected] Stephen S. Fong Department of Chemical and Life Science Engineering, Virginia Commonwealth University, P.O. Box 843028, Richmond, VA 23284, USA [email protected] Jennifer M. Fostel Division