Plant METHODS IN MOLECULAR BIOLOGYTM

John M. Walker, SERIES EDITOR

419. Post-Transcriptional Regulation, edited by 390. Targeting Protocols, Second Edition, edited Jeffrey Wilusz, 2008 by Mark van der Giezen, 2007 418. Avidin–Biotin Interactions: Methods and 389. Pichia Protocols, Second Edition, edited by James Applications, edited by Robert J. McMahon, 2008 M. Cregg, 2007 417. Tissue Engineering, Second Edition, edited by 388. Baculovirus and Insect Expression Protocols, Hannsjörg Hauser and Martin Fussenegger, 2007 Second Edition, edited by David W. 416. Gene Essentiality: Protocols and Bioinformatics, Murhammer, 2007 edited by Andrei L. Osterman, 2008 387. Serial Analysis of Gene Expression (SAGE): Digital 415. Innate Immunity, edited by Jonathan Ewbank and Gene Expression Profiling, edited by Kare Lehmann Eric Vivier, 2007 Nielsen, 2007 414. Apoptosis in : Methods and Protocols, edited 386. Peptide Characterization and Application by Gil Mor and Ayesha Alvero, 2008 Protocols edited by Gregg B. Fields, 2007 413. Protein Structure Prediction, Second Edition, 385. Microchip-Based Assay Systems: Methods and edited by Mohammed Zaki and Chris Bystroff, 2008 Applications, edited by Pierre N. Floriano, 2007 412. Neutrophil Methods and Protocols, edited by Mark 384. Capillary Electrophoresis: Methods and Protocols, T. Quinn, Frank R. DeLeo, and Gary M. edited by Philippe Schmitt-Kopplin, 2007 Bokoch, 2007 383. Cancer and Proteomics: Methods and 411. Reporter for Mammalian Systems, edited by Protocols, edited by Paul B. Fisher, 2007 Don Anson, 2007 382. Microarrays, Second Edition: Volume 2, 410. Environmental Genomics, edited by Cristofre C. Applications and Data Analysis, edited by Jang Martin, 2007 B. Rampal, 2007 409. Immunoinformatics: Predicting Immunogenicity In 381. Microarrays, Second Edition: Volume 1, Synthesis Silico, edited by Darren R. Flower, 2007 Methods, edited by Jang B. Rampal, 2007 408. Gene Function Analysis, edited by Michael Ochs, 380. Immunological Tolerance: Methods and Protocols, 2007 edited by Paul J. Fairchild, 2007 407. Stem Cell Assays, edited by Vemuri C. Mohan, 2007 379. Glycovirology Protocols edited by Richard 406. Plant Bioinformatics: Methods and Protocols, edited J. Sugrue, 2007 by David Edwards, 2007 378. Monoclonal Antibodies: Methods and Protocols, 405. Telomerase Inhibition: Strategies and Protocols, edited by Maher Albitar, 2007 edited by Lucy Andrews and Trygve O. Tollefsbol, 377. Microarray Data Analysis: Methods and 2007 Applications, edited by Michael J. Korenberg, 2007 404. Topics in , edited by Walter T. 376. Linkage Disequilibrium and Association Mapping: Ambrosius, 2007 Analysis and Application, edited by Andrew 403. Patch-Clamp Methods and Protocols, edited by R. Collins, 2007 Peter Molnar and James J. Hickman, 2007 375. In Vitro Transcription and Translation Protocols: 402. PCR Primer Design, edited by Anton Yuryev, 2007 Second Edition, edited by Guido Grandi, 2007 401. Neuroinformatics, edited by Chiquito J. 374. Quantum Dots: edited by Marcel Bruchez and Crasto, 2007 Charles Z. Hotz, 2007 400. Methods in Lipid Membranes, edited by Alex 373. Pyrosequencing® Protocols edited by Sharon Dopico, 2007 Marsh, 2007 399. Neuroprotection Methods and Protocols, edited by 372. Mitochondria: Practical Protocols, edited by Dario Tiziana Borsello, 2007 Leister and Johannes Herrmann, 2007 398. Lipid Rafts, edited by Thomas J. McIntosh, 2007 371. Biological Aging: Methods and Protocols, edited by 397. Hedgehog Signaling Protocols, edited by Jamila Trygve O. Tollefsbol, 2007 I. Horabin, 2007 370. Adhesion Protein Protocols, Second Edition, edited 396. Comparative Genomics, Volume 2, edited by by Amanda S. Coutts, 2007 Nicholas H. Bergman, 2007 369. Electron Microscopy: Methods and Protocols, 395. Comparative Genomics, Volume 1, edited by Second Edition, edited by John Kuo, 2007 Nicholas H. Bergman, 2007 368. Cryopreservation and Freeze-Drying Protocols, 394. Salmonella: Methods and Protocols, edited by Heide Second Edition, edited by John G. Day and Glyn Schatten and Abe Eisenstark, 2007 Stacey, 2007 393. Plant Secondary Metabolites, edited by Harinder 367. Mass Spectrometry Data Analysis in Proteomics P. S. Makkar, P. Siddhuraju, and Klaus edited by Rune Matthiesen, 2007 Becker, 2007 366. Cardiac Gene Expression: Methods and Protocols, 392. Molecular Motors: Methods and Protocols, edited by edited by Jun Zhang and Gregg Rokosh, 2007 Ann O. Sperry, 2007 365. Protein Phosphatase Protocols: edited by Greg 391. MRSA Protocols, edited by Yinduo Ji, 2007 Moorhead, 2007 METHODS IN MOLECULAR BIOLOGYTM

Plant Bioinformatics

Methods and Protocols

Edited by David Edwards Australian Centre for Plant , Institute for Molecular Biosciences and School of Land, Crop and Food Sciences, University of Queensland, Brisbane, Australia ©2007 Humana Press Inc. 999 Riverview Drive, Suite 208 Totowa, New Jersey 07512 www.humanapress.com

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording, or otherwise without written permission from the Publisher. Methods in Molecular BiologyTM is a trademark of The Humana Press Inc.

All papers, comments, opinions, conclusions, or recommendations are those of the author(s), and do not necessarily reflect the views of the publisher.

This publication is printed on acid-free paper. ANSI Z39.48-1984 (American Standards Institute) Permanence of Paper for Printed Library Materials

Production Editor: Christina M. Thomas Cover design by Karen Schulz

For additional copies, pricing for bulk purchases, and/or information about other Humana titles, contact Humana at the above address or at any of the following numbers: Tel.: 973-256-1699; Fax: 973-256-8341; E-mail: [email protected]; or visit our Website: www.humanapress.com

Photocopy Authorization Policy: Authorization to photocopy items for internal or personal use, or the internal or personal use of specific clients, is granted by Humana Press Inc., provided that the base fee of US $30 copy is paid directly to the Copyright Clearance Center at 222 Rosewood Drive, Danvers, MA 01923. For those organizations that have been granted a photocopy license from the CCC, a separate system of payment has been arranged and is acceptable to Humana Press Inc. The fee code for users of the Transactional Reporting Service is: [978-1-58829-653-5/07 $30]. Printed in the of America. 10987654321

Library of Congress Control Number: 2007933141. Preface

Bioinformatics is a new and rapidly evolving field of research that is being driven by the development and application of various omics technologies. The term bioinformatics means different things to different people, and following the theme of this series, this volume focuses on applied bioinformatics with specific applications to crops and model plants. This volume is aimed at plant biologists who have an interest in or requirement for accessing the huge amount of data being generated by high-throughput technologies. The volume would also be of interest to bioinformaticians and computer who would benefit from an introduction to the different tools and systems available for plant research. Bioinformatics was initially driven by the need to understand and exploit the information being generated by gene and sequencing projects, the scope of bioinformatics now extends from the genome to the phenome. It is the integration of information relating to heritable agronomic traits, including important metabolic profiles, with the emerging genome and transcriptome data that will drive plant research and bioinformatics developments into the future. One observation during the production of this volume is the increasing integration of both related data from different plants and also the integration of multiple diverse forms of data. I expect this trend to continue as this field of research continues to develop in the future. Scientific research by its nature progresses and changes, and this is especially true for bioinformatics. The rapid evolution of the tools and systems described in this volume may change during the lifetime of this edition, and it is suggested that the reader consults the relevant web pages directly as they frequently host detailed list of updates and changes. David Edwards Contents

Preface...... v Contributors ...... xi

1. The EMBL Nucleotide Sequence and Genome Reviews Databases Peter Sterk, Tamara Kulikova, Paul Kersey, and ..... 1 2. Using GenBank David Wheeler ...... 23 3. A Collection of Plant-Specific Genomic Data and Resources at NCBI Tatiana Tatusova, Brian Smith-White, and James Ostell ...... 61 4. UniProtKB/Swiss-Prot Emmanuel Boutet, Damien Lieberherr, Michael Tognolli, Michel Schneider, and ...... 89 5. Plant Database Resources at The Institute for Genomic Research Agnes P. Chan, Pablo D. Rabinowicz, John Quackenbush, C. Robin Buell, and Chris D. Town ...... 113 6. MIPS Plant Genome Information Resources Manuel Spannagl, Georg Haberer, Rebecca Ernst, Heiko Schoof, and Klaus F. X. Mayer...... 137 7. HarvEST Timothy J. Close, Steve Wanamaker, Mikeal L. Roose, and Matthew Lyon ...... 161 8. The TAIR Database Rebecca L. Poole ...... 179 9. AtEnsEMBL Nick James, Neil Graham, Debbie Clements, Beatrice Schildknecht, and Sean May ...... 213 10. Accessing Integrated Brassica Genetic and Genomic Data Using the BASC Server Christopher G. Love and David Edwards ...... 229 viii Contents

11. Leveraging Model Legume Information to Find Candidate Genes for Soybean Sudden Death Syndrome Using the Legume Information System Michael D. Gonzales, Kamal Gajendran, Andrew D. Farmer, Eric Archuleta, and William D. Beavis ...... 245 12. Legume Resources: MtDB and Medicago.Org Ernest F. Retzel, James E. Johnson, John A. Crow, Anne F. Lamblin, and Charles E. Paule ...... 261 13. BGI-RIS V2 Ximiao He and Jun Wang ...... 275 14. GrainGenes Helen O’Sullivan ...... 301 15. Gramene Doreen Ware...... 315 16. MaizeGDB Carolyn J. Lawrence ...... 331 17. BarleyBase/PLEXdb Roger P. Wise, Rico A. Caldo, Lu Hong, Lishuang Shen, Ethalinda Cannon, and Julie A. Dickerson ...... 347 18. Reaping the Benefits of SAGE Stephen J. Robinson, Justin D. Guenther, Christopher T. Lewis, Matthew G. Links and Isobel A.P. Parkin ...... 365 19. Methods for Analysis of Gene Expression in Plants Using MPSS Kan Nobuta, Kalyan Vemaraju, and Blake C. Meyers ...... 387 20. Metabolomics Data Analysis, Visualization, and Integration Lloyd W. Sumner, Ewa Urbanczyk-Wochniak, and Corey D. Broeckling...... 409 21. KEGG Bioinformatics Resource for Plant Genomics Research Ali Masoudi-Nejad, Susumu Goto, Takashi R. Endo, and Minoru Kanehisa ...... 437 22. International Crop Information System for Germplasm Data Management Arllet Portugal, Ranjan Balachandra, Thomas Metz, Richard Bruskiewich, and Graham McLaren ...... 459 Contents ix

23. Automated Discovery of Single Nucleotide Polymorphism and Simple Sequence Repeat Molecular Genetic Markers Jacqueline Batley, Erica Jewell, and David Edwards ...... 473 24. Methods for Gene Ontology Annotation Emily Dimmer, Tanya Z. Berardini, Daniel Barrell, and Evelyn Camon ...... 495 25. Gene Structure Annotation at PlantGDB Volker Brendel ...... 521 26. An Introduction to BioPerl Jason E. Stajich ...... 535 Index ...... 549 Contributors

Rolf Apweiler • EMBL-Outstation The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK Eric Archuleta • National Center for Genome Resources, Santa Fe, NM Amos Bairoch • Department of Structural and Bioinformatics, Centre Medical Universitaire, Swiss Institute of Bioinformatics and University of Geneva, Geneva, Switzerland Ranjan Balachandra • Department of Primary Industries, Australian Temperate Field Crops Collection, Horsham Victoria, Australia Jacqueline Batley • Australian Centre for Plant Functional Genomics, School of Land, Crop and Food Sciences and ARC Centre of Excellence for Integrative Legume Research, CILR, The University of Queensland, Brisbane, Australia William D. Beavis • National Center for Genome Resources, Santa Fe, NM Tanya Z. Berardini • Department of Plant Biology, The Arabidopsis Information Resource, Carnegie Institution of Washington, Stanford, CA Emmanuel Boutet • Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland Volker Brendel • Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA Corey D. Broeckling • The Samuel Roberts Noble Foundation, Plant Biology Division, Sam Noble Parkway, Ardmore, OK, and Department of Horticulture and Landscape Architecture, Colorado State University, Fort Collins, CO Richard Bruskiewich • International Rice Research Institute, Metro Manila, Philippines C. Robin Buell • The Institute for Genomic Research, Rockville, MD Rico A. Caldo • Department of Plant Pathology and Center for Plant Responses to Environmental Stresses, Iowa State University, Ames, IA Evelyn Camon • European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK Agnes P. Chan • The Institute for Genomic Research, Rockville, MD Debbie Clements • Nottingham Arabidopsis Stock Centre, School of Biosciences, University of Nottingham, Loughborough, UK xii Contributors

Timothy J. Close • University of California, Riverside, CA John A. Crow • Center for Computational Genomics and Bioinformatics, University of Minnesota, Minneapolis, MN Julie A. Dickerson • Virtual Reality Applications Center and Department of Electrical and Computer Engineering, Iowa State University, Ames, IA Emily Dimmer • European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK David Edwards • Australian Centre for Plant Functional Genomics, Institute for Molecular Biosciences and School of Land, Crop and Food Sciences, University of Queensland, Brisbane, Australia Takashi R. Endo • Laboratory of Bioknowledge Systems, Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho Uji, Kyoto, Rebecca Ernst • MIPS Institute for Bioinformatics, GSF Research Center for Environment and Health, Neuherberg, Germany Andrew D. Farmer • National Center for Genome Resources, Santa Fe, NM Kamal Gajendran • National Center for Genome Resources, Santa Fe, NM Michael D. Gonzales • National Center for Genome Resources, Santa Fe, NM Susumu Goto • Laboratory of Plant Genetics, Division of Applied Bioscience, Kyoto University, Kitashirakawa Oiwake-cho, Kyoto, Japan Neil Graham • Nottingham Arabidopsis Stock Centre, School of Biosciences, University of Nottingham, Loughborough, UK Justin D. Guenther • Agriculture and Agri-Food, Saskatoon Research Centre, Saskatoon, Saskatchewan, Canada Georg Haberer • MIPS Institute for Bioinformatics, GSF Research Center for Environment and Health, Neuherberg, Germany Ximiao He • Beijing Genomics Institute, Chinese Academy of Sciences, Beijing, China Lu Hong • Interdepartmental Bioinformatics & , Iowa State University, Ames, IA David V. Huhman • The Samuel Roberts Noble Foundation, Plant Biology Division, Sam Noble Parkway, Ardmore, OK Nick James • Nottingham Arabidopsis Stock Centre, School of Biosciences, University of Nottingham, Loughborough, UK Erica Jewell • Department of Primary Industries, Primary Industries Research Victoria, Animal Genetics and Genomics, Attwood, Victoria, Australia Contributors xiii

James E. Johnson • Center for Computational Genomics and Bioinformatics, University of Minnesota, Minneapolis, MN Minoru Kanehisa • Laboratory of Plant Genetics, Division of Applied Bioscience, Kyoto University, Kitashirakawa Oiwake-cho, Kyoto, Japan Paul Kersey • EMBL-Outstation The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK Tamara Kulikova • EMBL-Outstation The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK Anne F. Lamblin • Center for Computational Genomics and Bioinformatics, University of Minnesota, Minneapolis, MN Carolyn J. Lawrence • Genetics Building, Iowa State University, Ames, IA Christopher T. Lewis • Agriculture and Agri-Food, Saskatoon Research Centre, Saskatoon, Saskatchewan, Canada Damien Lieberherr • Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland Christopher G. Love • Primary Industries Research Victoria, Plant Genetics and Genomics, Victorian AgriBiosciences Centre, La Trobe University, Bundoora, Victoria, Australia Matthew Lyon • University of California, Riverside, CA Ali Masoudi-Nejad • Laboratory of Bioknowledge Systems, Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho Uji, Kyoto, Japan, and Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran Sean May • Nottingham Arabidopsis Stock Centre, School of Biosciences, University of Nottingham, Loughborough, UK Klaus F. X. Mayer • MIPS Institute for Bioinformatics, GSF Research Center for Environment and Health, Neuherberg, Germany Graham McLaren • International Rice Research Institute, Metro Manila, Philippines Thomas Metz • International Rice Research Institute, Metro Manila, Philippines Blake C. Meyers • Delaware Biotechnology Institute, University of Delaware, Newark, DE Kan Nobuta • Delaware Biotechnology Institute, University of Delaware, Newark, DE Helen O’Sullivan • ACPFG Australian Centre for Plant Functional Genomics, University of Adelaide, Urrbrae, Australia and Department of Biological Sciences, University of Bristol, Bristol, UK xiv Contributors

James Ostell • National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD Isobel A. P. Parkin • Agriculture and Agri-Food, Saskatoon Research Centre, Saskatoon, Saskatchewan, Canada Charles E. Paule • Center for Computational Genomics and Bioinformatics, University of Minnesota, Minneapolis, MN Rebecca L. Poole • Biological Sciences Building, Clifton, Bristol, UK Arllet Portugal • International Rice Research Institute, Metro Manila, Philippines John Quackenbush • The Institute for Genomic Research, Rockville, MD Pablo D. Rabinowicz • The Institute for Genomic Research, Rockville, MD Ernest F. Retzel • Center for Computational Genomics and Bioinformatics, University of Minnesota, Minneapolis, MN Stephen J. Robinson • Agriculture and Agri-Food, Saskatoon Research Centre, Saskatoon, Saskatchewan, Canada Mikeal L. Roose • University of California, Riverside, CA Beatrice Schildknecht • Nottingham Arabidopsis Stock Centre, School of Biosciences, University of Nottingham, Loughborough, UK Michel Schneider • Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland Heiko Schoof • Max Planck Institute for Plant Breeding Research, Plant Computational Biology, Koeln, Germany Lishuang Shen • Virtual Reality Applications Center, Iowa State University, Ames, IA Brian Smith-White • National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD Manuel Spannagl • MIPS Institute for Bioinformatics, GSF Research Center for Environment and Health, Neuherberg, Germany Jason E. Stajich • Department of Molecular Genetics and Microbiology, Duke University, Durham, NC Peter Sterk • EMBL-Outstation The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK Lloyd W. Sumner • The Samuel Roberts Noble Foundation, Plant Biology Division, Ardmore, OK Tatiana Tatusova • National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD Michael Tognolli • Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland Chris D. Town • The Institute for Genomic Research, Rockville, MD Contributors xv

Ewa Urbanczyk-Wochniak • The Samuel Roberts Noble Foundation, Plant Biology Division, Ardmore, OK Kalyan Vemaraju • Delaware Biotechnology Institute, University of Delaware, Newark, DE Steve Wanamaker • University of California, Riverside, CA Jun Wang • Beijing Genomics Institute, Chinese Academy of Sciences, Beijing, China Doreen Ware • Gramene, Cold Spring Harbor, NY David Wheeler • National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD Roger P. Wise • Corn Insects and Crop Genetics Research, USDA-ARS and Department of Plant Pathology and Center for Plant Responses to Environmental Stresses, Iowa State University, Ames, IA