Flybase: Introduction of the Drosophila Melanogaster Release 6 Reference Genome Assembly and Large-Scale Migration of Genome Annotations

Total Page:16

File Type:pdf, Size:1020Kb

Flybase: Introduction of the Drosophila Melanogaster Release 6 Reference Genome Assembly and Large-Scale Migration of Genome Annotations FlyBase: introduction of the Drosophila melanogaster Release 6 reference genome assembly and large-scale migration of genome annotations The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation dos Santos, Gilberto, Andrew J. Schroeder, Joshua L. Goodman, Victor B. Strelets, Madeline A. Crosby, Jim Thurmond, David B. Emmert, and William M. Gelbart. 2015. “FlyBase: introduction of the Drosophila melanogaster Release 6 reference genome assembly and large-scale migration of genome annotations.” Nucleic Acids Research 43 (Database issue): D690-D697. doi:10.1093/nar/gku1099. http://dx.doi.org/10.1093/nar/gku1099. Published Version doi:10.1093/nar/gku1099 Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:15034860 Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA D690–D697 Nucleic Acids Research, 2015, Vol. 43, Database issue Published online 14 November 2014 doi: 10.1093/nar/gku1099 FlyBase: introduction of the Drosophila melanogaster Release 6 reference genome assembly and large-scale migration of genome annotations Gilberto dos Santos1,*, Andrew J. Schroeder1, Joshua L. Goodman2, Victor B. Strelets2, Madeline A. Crosby1, Jim Thurmond2, David B. Emmert1, William M. Gelbart1 and the FlyBase Consortium† 1The Biological Laboratories, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA and 2Department of Biology, Indiana University, Bloomington, IN 47405, USA Received September 29, 2014; Revised October 17, 2014; Accepted October 22, 2014 ABSTRACT genome (Release 6). This improved assembly was coordi- nately integrated into FlyBase and NCBI (4) and became Release 6, the latest reference genome assembly of the reference genome assembly for D. melanogaster as of the the fruit fly Drosophila melanogaster, was released summer of 2014. by the Berkeley Drosophila Genome Project in 2014; The challenge when replacing one assembly with another it replaces their previous Release 5 genome assem- is the migration of anchored genomic data from the old as- bly, which had been the reference genome assem- sembly to the new one. This is especially true given that bly for over 7 years. With the enormous amount of the scale of the data involved in the migration to the Re- information now attached to the D. melanogaster lease 6 genome assembly far exceeds that involved in the genome in public repositories and individual labo- last such migration to the Release 5 assembly in 2006; there ratories, the replacement of the previous assembly is an enormous amount of data in public repositories such by the new one is a major event requiring careful mi- as FlyBase, NCBI, modENCODE DCC (5), modMine (6) and the UCSC Genome Browser (7), as well as in private gration of annotations and genome-anchored data databases of individual laboratories that has to be updated. to the new, improved assembly. In this report, we de- The new Release 6 reference genome assembly, the migra- scribe the attributes of the new Release 6 reference tion of FlyBase annotations and genome-anchored data to genome assembly, the migration of FlyBase genome this new genome assembly, and a user guide on how to in- annotations to this new assembly, how genome fea- terrogate it and update genomic coordinates are the topics tures on this new assembly can be viewed in Fly- of this report. Base (http://flybase.org) and how users can convert coordinates for their own data to the corresponding THE RELEASE 6 REFERENCE GENOME ASSEMBLY Release 6 coordinates. Description of BDGP Release 6 genome assembly A reference genome assembly for D. melanogaster was first INTRODUCTION released in 2000 (8,9). This assembly was based on nuclear FlyBase (http://flybase.org) is a database of Drosophila- DNA sequences derived from the ‘iso-1’ reference strain: related genetic and genomic information (1). From late- isogenic yellow (y1); cinnabar (cn1), brown (bw1), speck (sp1) 2006 until mid-2014, the reference genome assembly for (10). Since then, this iso-1-derived reference nuclear genome the biomedical model organism, Drosophila melanogaster, assembly has been revised to close gaps, improve sequence has been the Berkeley Drosophila Genome Project (BDGP, quality and add sequence from centric heterochromatin re- http://fruitfly.org) Release 5 genome assembly (2). During gions (2,11). this period, FlyBase has published 57 updates to the gene The latest BDGP Release 6 of the D. melanogaster model annotations of this species’ genome, and has inte- genome assembly, still based on the iso-1 reference strain, grated many other mapped genome features, most notably includes a total of 143.7 Mb on 1870 scaffolds compris- those of the modENCODE project (3). In the spring of ing 2442 contigs (Table 1). The vast majority, 137.6 Mb, 2014, BDGP released a new assembly of the D. melanogaster of this sequence resides on seven chromosome arms: X, *To whom correspondence should be addressed. Tel: +1 617 495 9925; Fax: +1 617 496 1354; Email: [email protected] †The members of the FlyBase Consortium are listed in the Acknowledgements. C The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected] Nucleic Acids Research, 2015, Vol. 43, Database issue D691 2L, 2R, 3L, 3R, 4 and Y (Table 2). Additionally, there are DATA MIGRATION TO THE RELEASE 6 REFERENCE 1862 minor scaffolds that lack precise localization. Almost GENOME ASSEMBLY half of these ‘unlocalized’ scaffolds (n = 884) have been Feature migration mapped to a chromosome region by comparative genome hybridization in embryos with various chromosome defi- FlyBase represents as many features located on the genome ciencies: 2CEN (centromere-proximal region of chromo- as possible. Primary among these are the gene model an- some 2), 3CEN, X, Y, XY (regions mapping to both X and notations, represented by localized exon features that are Y chromosomes) and rDNA (ribosomal DNA) (12). The joined into one or more models representing transcript iso- Release 6 reference genome assembly also replaces the pre- forms. In addition to the gene models, FlyBase contains nu- vious mitochondrial reference genome assembly, a compos- merous other types of localized features including ones that ite of sequences from various D. melanogaster strains, with are part of the sequenced genome (e.g. protein-binding sites, one derived exclusively from the iso-1 reference strain (Ta- enhancers), reagents and variations reported in the litera- ble 1). ture (e.g. aberrations, mutant alleles, mobile element inser- tion sites) and discretely aligned features (cDNAs, protein similarities, gene model predictions, RNA-Seq splice junc- tions). The replacement of the Release 5 genome assembly with Release 6 required migration of millions of discrete fea- Major improvements in the Release 6 genome assembly com- tures and over a hundred RNA-Seq datasets from the old to pared to Release 5 the new assembly. Assembly-to-assembly alignments between Release 5 and Release 6 has a number of important changes from Release Release 6 were generated by NCBI (4), using previously 5(Tables1 and 2). Release 6 is 4.2 Mb larger, even as the described methods (http://www.ncbi.nlm.nih.gov/genome/ total assembly gap length has been reduced to 1.2 Mb, a tools/remap/docs/alignments),andprovidedtoFlyBase. decrease of 1.5 Mb. The main areas of improvement are to Using the information provided in these alignments, a map the centric heterochromatin regions of the major chromo- was generated that identified coordinate spans in Release 5 some arm scaffolds (X, 2L, 2R, 3L, 3R, 4), which have in- that correspond to equivalent unchanged regions in the new corporated over 10 Mb of sequence from minor Release 5 Release 6 assembly. Because the Release 6 assembly con- scaffolds (XHet, 2LHet, 2RHet, 3LHet, 3RHet and U). The tains some regions that are inverted relative to their orienta- chromosome Y scaffold has been improved dramatically, in- tion in Release 5, and other regions that have moved to dif- creasing over 10-fold in size to 3.1 Mb. Almost all remain- ferent scaffolds, the mapping file and necessary coordinate ing gaps in the chromosome arm scaffolds are in the het- transformations needed to take these types of changes into erochromatic regions. Small, unmapped scaffolds are now account. The mapping file underlies the coordinates con- represented individually, instead of being concatenated into verter tool (see below) and is available at FlyBase (http:// pseudoscaffolds (e.g. Release 5 pseudoscaffolds U, 3LHet). flybase.org/reports/FBrf0225389.html). This map was used to perform the coordinate conversions necessary to update the location of the features from their old Release 5 coordi- nates and localize them to the new scaffolds of the Release 6 assembly. Access to the Release 6 and Release 5 genome assemblies The BDGP Release 5 and Release 6 genome assemblies have been deposited at NCBI. Full reports are available at NCBI Realignment of high-throughput datasets (4), which provides global statistics and quality metrics for each assembly, as well as a ‘full sequence report’ and links For many types of high-throughput datasets, direct map- to GenBank (13) and RefSeq (14) accessions for individual ping of reads to the new genome assembly is preferable to scaffolds (Table 3). the migration process described above, since it allows align- Beginning with FlyBase update FB2014 04 (released July ment to newly sequenced regions.
Recommended publications
  • Maternally Inherited Pirnas Direct Transient Heterochromatin Formation
    RESEARCH ARTICLE Maternally inherited piRNAs direct transient heterochromatin formation at active transposons during early Drosophila embryogenesis Martin H Fabry, Federica A Falconio, Fadwa Joud, Emily K Lythgoe, Benjamin Czech*, Gregory J Hannon* CRUK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Cambridge, United Kingdom Abstract The PIWI-interacting RNA (piRNA) pathway controls transposon expression in animal germ cells, thereby ensuring genome stability over generations. In Drosophila, piRNAs are intergenerationally inherited through the maternal lineage, and this has demonstrated importance in the specification of piRNA source loci and in silencing of I- and P-elements in the germ cells of daughters. Maternally inherited Piwi protein enters somatic nuclei in early embryos prior to zygotic genome activation and persists therein for roughly half of the time required to complete embryonic development. To investigate the role of the piRNA pathway in the embryonic soma, we created a conditionally unstable Piwi protein. This enabled maternally deposited Piwi to be cleared from newly laid embryos within 30 min and well ahead of the activation of zygotic transcription. Examination of RNA and protein profiles over time, and correlation with patterns of H3K9me3 deposition, suggests a role for maternally deposited Piwi in attenuating zygotic transposon expression in somatic cells of the developing embryo. In particular, robust deposition of piRNAs targeting roo, an element whose expression is mainly restricted to embryonic development, results in the deposition of transient heterochromatic marks at active roo insertions. We hypothesize that *For correspondence: roo, an extremely successful mobile element, may have adopted a lifestyle of expression in the [email protected] embryonic soma to evade silencing in germ cells.
    [Show full text]
  • The Biogrid Interaction Database
    D470–D478 Nucleic Acids Research, 2015, Vol. 43, Database issue Published online 26 November 2014 doi: 10.1093/nar/gku1204 The BioGRID interaction database: 2015 update Andrew Chatr-aryamontri1, Bobby-Joe Breitkreutz2, Rose Oughtred3, Lorrie Boucher2, Sven Heinicke3, Daici Chen1, Chris Stark2, Ashton Breitkreutz2, Nadine Kolas2, Lara O’Donnell2, Teresa Reguly2, Julie Nixon4, Lindsay Ramage4, Andrew Winter4, Adnane Sellam5, Christie Chang3, Jodi Hirschman3, Chandra Theesfeld3, Jennifer Rust3, Michael S. Livstone3, Kara Dolinski3 and Mike Tyers1,2,4,* 1Institute for Research in Immunology and Cancer, Universite´ de Montreal,´ Montreal,´ Quebec H3C 3J7, Canada, 2The Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, Ontario M5G 1X5, Canada, 3Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA, 4School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JR, UK and 5Centre Hospitalier de l’UniversiteLaval´ (CHUL), Quebec,´ Quebec´ G1V 4G2, Canada Received September 26, 2014; Revised November 4, 2014; Accepted November 5, 2014 ABSTRACT semi-automated text-mining approaches, and to en- hance curation quality control. The Biological General Repository for Interaction Datasets (BioGRID: http://thebiogrid.org) is an open access database that houses genetic and protein in- INTRODUCTION teractions curated from the primary biomedical lit- Massive increases in high-throughput DNA sequencing erature for all major model organism species and technologies (1) have enabled an unprecedented level of humans. As of September 2014, the BioGRID con- genome annotation for many hundreds of species (2–6), tains 749 912 interactions as drawn from 43 149 pub- which has led to tremendous progress in the understand- lications that represent 30 model organisms.
    [Show full text]
  • Flybase: Updates to the Drosophila Melanogaster Knowledge Base Aoife Larkin 1,*, Steven J
    Published online 21 November 2020 Nucleic Acids Research, 2021, Vol. 49, Database issue D899–D907 doi: 10.1093/nar/gkaa1026 FlyBase: updates to the Drosophila melanogaster knowledge base Aoife Larkin 1,*, Steven J. Marygold 1, Giulia Antonazzo1, Helen Attrill 1, Gilberto dos Santos2, Phani V. Garapati1, Joshua L. Goodman3,L.SianGramates 2, Gillian Millburn1, Victor B. Strelets3, Christopher J. Tabone2, Jim Thurmond3 and FlyBase Consortium† 1 Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge Downloaded from https://academic.oup.com/nar/article/49/D1/D899/5997437 by guest on 15 January 2021 CB2 3DY, UK, 2The Biological Laboratories, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA and 3Department of Biology, Indiana University, Bloomington, IN 47405, USA Received September 15, 2020; Revised October 13, 2020; Editorial Decision October 14, 2020; Accepted October 22, 2020 ABSTRACT In this paper, we highlight a variety of new features and improvements that have been integrated into FlyBase since FlyBase (flybase.org) is an essential online database our last review (3). We have introduced a number of new fea- for researchers using Drosophila melanogaster as a tures that allow users to more easily find and use the data in model organism, facilitating access to a diverse ar- which they are interested; these include customizable tables, ray of information that includes genetic, molecular, improved searching using expanded Experimental Tool Re- genomic and reagent resources. Here, we describe ports, more links to and from other resources, and bet- the introduction of several new features at FlyBase, ter provision of precomputed bulk files. We have upgraded including Pathway Reports, paralog information, dis- tools such as Fast-Track Your Paper and ID validator.
    [Show full text]
  • Annual Scientific Report 2011 Annual Scientific Report 2011 Designed and Produced by Pickeringhutchins Ltd
    European Bioinformatics Institute EMBL-EBI Annual Scientific Report 2011 Annual Scientific Report 2011 Designed and Produced by PickeringHutchins Ltd www.pickeringhutchins.com EMBL member states: Austria, Croatia, Denmark, Finland, France, Germany, Greece, Iceland, Ireland, Israel, Italy, Luxembourg, the Netherlands, Norway, Portugal, Spain, Sweden, Switzerland, United Kingdom. Associate member state: Australia EMBL-EBI is a part of the European Molecular Biology Laboratory (EMBL) EMBL-EBI EMBL-EBI EMBL-EBI EMBL-European Bioinformatics Institute Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD United Kingdom Tel. +44 (0)1223 494 444, Fax +44 (0)1223 494 468 www.ebi.ac.uk EMBL Heidelberg Meyerhofstraße 1 69117 Heidelberg Germany Tel. +49 (0)6221 3870, Fax +49 (0)6221 387 8306 www.embl.org [email protected] EMBL Grenoble 6, rue Jules Horowitz, BP181 38042 Grenoble, Cedex 9 France Tel. +33 (0)476 20 7269, Fax +33 (0)476 20 2199 EMBL Hamburg c/o DESY Notkestraße 85 22603 Hamburg Germany Tel. +49 (0)4089 902 110, Fax +49 (0)4089 902 149 EMBL Monterotondo Adriano Buzzati-Traverso Campus Via Ramarini, 32 00015 Monterotondo (Rome) Italy Tel. +39 (0)6900 91402, Fax +39 (0)6900 91406 © 2012 EMBL-European Bioinformatics Institute All texts written by EBI-EMBL Group and Team Leaders. This publication was produced by the EBI’s Outreach and Training Programme. Contents Introduction Foreword 2 Major Achievements 2011 4 Services Rolf Apweiler and Ewan Birney: Protein and nucleotide data 10 Guy Cochrane: The European Nucleotide Archive 14 Paul Flicek:
    [Show full text]
  • The Drosophila Melanogaster Transcriptome by Paired-End RNA Sequencing
    Downloaded from genome.cshlp.org on September 29, 2021 - Published by Cold Spring Harbor Laboratory Press Resource The Drosophila melanogaster transcriptome by paired-end RNA sequencing Bryce Daines,1,2,6 Hui Wang,2,6 Liguo Wang,3,6 Yumei Li,1,2 Yi Han,2 David Emmert,4 William Gelbart,4 Xia Wang,1,2 Wei Li,3,5 Richard Gibbs,1,2,7 and Rui Chen1,2,7 1Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA; 2Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA; 3Dan L. Duncan Cancer Center, Baylor College of Medicine, Houston, Texas 77030, USA; 4Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts 02138, USA; 5Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, Texas 77030, USA RNA-seq was used to generate an extensive map of the Drosophila melanogaster transcriptome by broad sampling of 10 developmental stages. In total, 142.2 million uniquely mapped 64–100-bp paired-end reads were generated on the Illumina GA II yielding 3563 sequencing coverage. More than 95% of FlyBase genes and 90% of splicing junctions were observed. Modifications to 30% of FlyBase gene models were made by extension of untranslated regions, inclusion of novel exons, and identification of novel splicing events. A total of 319 novel transcripts were identified, representing a 2% increase over the current annotation. Alternate splicing was observed in 31% of D. melanogaster genes, a 38% increase over previous estimations, but significantly less than that observed in higher organisms. Much of this splicing is subtle such as tandem alternate splice sites.
    [Show full text]
  • Rnacentral: Progress in the Development of the Non-Coding RNA Sequence Database
    RNAcentral: Progress in the Development of the Non-coding RNA Sequence Database Other potential titles: ● RNAcentral: New Developments in the Non-coding RNA Sequence Database ● RNAcentral: Three Years of ncRNA Data Integration Anton I. Petrov​1,​ Blake A. Sweeney​1,​ Boris Burkov​1*,​ Simon Kay​1,​ Rob Finn​1,​ Alex Bateman​1,​ and The RNAcentral Consortium European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD *To whom correspondence should be addressed: [email protected] BACKGROUND More than fifty major specialised resources exist in the area of non-coding RNA (ncRNA) research. RNAcentral (h​ttp://rnacentral.org)​ provides a unified interface that aggregates information from over twenty of them, including: - larger databases (e.g. RefSeq, ENA, Ensembl!) - databases with specific content type (e.g. Rfam, PDBe) - RNA type-specific databases (e.g. miRBase, GtRNAdb, snoPY) - organism-specific databases (e.g. HGNC, FlyBase, PomBase). RESULTS Launched in 2014 [2], RNAcentral currently contains over 10 million ncRNA sequences from more than 20 RNA databases and Is updated several times a year. Recent updates include ncRNA data from: - HGNC - Ensembl - FlyBase - Rfam (most of sequences in RNAcentral are annotated with Rfam families, while the rest are candidates for producing new Rfam families) There are three main ways of browsing the data through the RNAcentral website. - The text search makes it easy to explore all ncRNA sequences, compare data across different resources, and discover what is known about each ncRNA. - Using the sequence similarity search one can search data from multiple RNA databases starting from a sequence. - Finally, one can explore ncRNAs in select species by genomic location using an integrated genome browser.
    [Show full text]
  • Basic Bioinformatics
    Basic Bioinformatics Purpose of this protocol: From your literature reading, you might find proteins from other model systems (for example mouse) that have known roles in apoptosis, either upstream or downstream of IAPs. You will need to find out: A Search GenBank for the mammalian gene(s) mentioned in the literature, get the GenBank accession numbers of these genes; B Search FlyBase to identify Drosophila homologs and to find out more information on these genes – CG number of the gene, coding and 5’ and 3’-UTR sequences, and other information on the function of this gene. Also find out if it has alternative isoforms; C Search OpenBiosystems to find out if they carry RNAi clones of these genes, and get the clone ID and catalog number; This protocol has three sections, to answer the above three questions. Part A: Find genes in GenBank by name. 1) Go to NCBI website: http://www.ncbi.nlm.nih.gov/ . Under “Search”, select “Nucleotide” and type in the gene name and species that you want to search for. Use Booleans, and tags which you can find at the PubMed help page: http://www.ncbi.nlm.nih.gov/entrez/query/static/help/pmhelp.html#SearchFieldD escriptionsandTags If you know the author who published the paper on the gene you are interested, you can also include the author name in your search along with the gene name and species. Note: You will usually get more than one result. Read the title of each to screen out the ones that are obviously not what you are looking for.
    [Show full text]
  • Flybase 2.0: the Next Generation Jim Thurmond1, Joshua L
    Published online 26 October 2018 Nucleic Acids Research, 2019, Vol. 47, Database issue D759–D765 doi: 10.1093/nar/gky1003 FlyBase 2.0: the next generation Jim Thurmond1, Joshua L. Goodman1, Victor B. Strelets1, Helen Attrill2, L. Sian Gramates 3, Steven J. Marygold2, Beverley B. Matthews3, Gillian Millburn2, Giulia Antonazzo2, Vitor Trovisco2, Thomas C. Kaufman1, Brian R. Calvi1,* and the FlyBase Consortium† 1Department of Biology, Indiana University, Bloomington, IN 47408, USA, 2Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3DY, UK and 3The Biological Laboratories, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA Received September 20, 2018; Editorial Decision October 06, 2018; Accepted October 09, 2018 ABSTRACT primary scientific literature covering more than a century of genetics research. Over the years, the consortium has de- FlyBase (flybase.org) is a knowledge base that sup- veloped new formats of data display and new bioinformatic ports the community of researchers that use the fruit tools to mine these data for biological discovery and trans- fly, Drosophila melanogaster, as a model organism. lational research. These efforts have transformed FlyBase The FlyBase team curates and organizes a diverse from a simple database into a powerful knowledge base. array of genetic, molecular, genomic, and develop- The FlyBase site has undergone major changes since our mental information about Drosophila. At the begin- last review two years ago (1). In February 2017, we released ning of 2018, ‘FlyBase 2.0’ was released with a signifi- a beta version of the next-generation website, which we have cantly improved user interface and new tools.
    [Show full text]
  • Phylogenetic Profiling of the Arabidopsis Thaliana Proteome
    Open Access Research2004Gutiérrezet al.Volume 5, Issue 8, Article R53 Phylogenetic profiling of the Arabidopsis thaliana proteome: what comment proteins distinguish plants from other organisms? Rodrigo A Gutiérrez*†§, Pamela J Green‡, Kenneth Keegstra*† and John B Ohlrogge† Addresses: *Department of Energy Plant Research Laboratory, Michigan State University, East Lansing, MI 48824-1312, USA. †Department of Plant Biology, Michigan State University, East Lansing, MI 48824-1312, USA. ‡Delaware Biotechnology Institute, University of Delaware, 15 § Innovation Way, Newark, DE 19711, USA. Current address: Department of Biology, New York University, 100 Washington Square East, New reviews York, NY 10003, USA. Correspondence: Rodrigo A Gutiérrez. E-mail: [email protected] Published: 15 July 2004 Received: 16 March 2004 Revised: 10 May 2004 Genome Biology 2004, 5:R53 Accepted: 7 June 2004 The electronic version of this article is the complete one and can be found online at http://genomebiology.com/2004/5/8/R53 reports © 2004 Gutiérrez et al.; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL Phylogenetic<p>Theopportunitycodingof life</p> genes availability to profilingof decipher <it>Arabidopsis of theof the the complete genetic Arabidopsis thaliana genomefactors </it>onthaliana that sequence define the proteome: ofbasis plant <it>Arab of form whattheir idopsisand proteinspattern function. thaliana of distingu sequence To </it>together beginish similarityplants this task, from wi to thwe other organismsthose have organisms? of classified other across organisms the the nuc threlear proe domains protein-vides an deposited research Abstract Background: The availability of the complete genome sequence of Arabidopsis thaliana together with those of other organisms provides an opportunity to decipher the genetic factors that define plant form and function.
    [Show full text]
  • Arabidopsis Thaliana Functional Genomics Project Annual Report 2008
    The Multinational Coordinated Arabidopsis thaliana Functional Genomics Project Annual Report 2008 Xing Wang Deng [email protected] Chair Joe Kieber [email protected] Co-chair Joanna Friesner [email protected] MASC Coordinator and Executive Secretary Thomas Altmann [email protected] Sacha Baginsky [email protected] Ruth Bastow [email protected] Philip Benfey [email protected] David Bouchez [email protected] Jorge Casal [email protected] Danny Chamovitz [email protected] Bill Crosby [email protected] Joe Ecker [email protected] Klaus Harter [email protected] Marie-Theres Hauser [email protected] Pierre Hilson [email protected] Eva Huala [email protected] Jaakko Kangasjärvi [email protected] Julin Maloof [email protected] Sean May [email protected] Peter McCourt [email protected] Harvey Millar [email protected] Ortrun Mittelsten- Scheid [email protected] Basil Nikolau [email protected] Javier Paz-Ares [email protected] Chris Pires [email protected] Barry Pogson [email protected] Ben Scheres [email protected] Randy Scholl [email protected] Heiko Schoof [email protected] Kazuo Shinozaki [email protected] Klaas van Wijk [email protected] Paola Vittorioso [email protected] Wolfram Weckwerth [email protected] Weicai Yang [email protected] The Multinational Arabidopsis Steering Committee—July 2008 Front Cover Design Philippe Lamesch, Curator at TAIR/Carnegie Institute for Science, and Joanna Friesner, MASC Coordinator at the University of California, Davis, USA Images Map of geographical distribution of ecotypes of Arabidopsis thaliana Updated representation contributed by Philippe Lamesch, adapted from Jonathon Clarke, UK (1993).
    [Show full text]
  • Bioinformatics Approaches for Functional Predictions in Diverse Informatics Environments. Paula Maria Moolhuijzen, Bsc This Thes
    Bioinformatics approaches for functional predictions in diverse informatics environments. Paula Maria Moolhuijzen, BSc This thesis is presented for the degree of Doctor of Philosophy of Murdoch University 2011 i Declaration I declare that this thesis is my own account of my research and contains as its main content work, which has not previously been submitted for a degree at any tertiary education institution. Signature: Name: Paula Moolhuijzen Date: 4th October 2011 ii Abstract Bioinformatics is the scientific discipline that collates, integrates and analyses data and information sets for the life sciences. Critically important in agricultural and biomedical fields, there is a pressing need to integrate large and diverse data sets into biologically significant information. This places major challenges on research strategies and resources (data repositories, computer infrastructure and software) required to integrate relevant data and analysis workflows. These challenges include: ! The construction of processes to integrate data from disparate and diverse resources and legacy systems that have variable data formats, qualities, availability and accessibility constraints. ! Substantially contributing to hypothesis driven research for biologically significant information. The hypothesis proposed in this thesis is that in organisms from divergent origins, with differing data availability and analysis resources, in silico approaches can identify genomic targets in a range of disease systems. The particular aims were to: 1. Overcome data constraints that impact analysis of different organisms. 2. Make functional genomic predictions in diverse biological systems. 3. Identify specific genomic targets for diagnostics and therapeutics in diverse disease mechanisms. iii In order to test the hypothesis three case studies in human cancer, pathogenic bacteria, and parasitic arthropod were selected, the results are as follows.
    [Show full text]
  • Abstracts Selected for Platform Presentation
    Abstracts Selected for Platform Presentation Using semantic similarity analysis based on Human Phenotype Ontology terms to identify genetic etiologies for epilepsy and neurodevelopmental disorders Ingo Helbig1,2,3, Shiva Ganesan1,2, Peter Galer1,2, Colin A. Ellis3, Katherine L. Helbig1,2 1 Division of Neurology, Children’s Hospital of Philadelphia, Philadelphia, PA, 19104 USA 2 Department of Biomedical and Health Informatics (DBHi), Children’s Hospital of Philadelphia, Philadelphia, PA, 19104 USA 3 Department of Neurology, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA, 19104 USA More than 25% of children with intractable epilepsies have an identifiable genetic cause, and the goal of epilepsy precision medicine is to stratify patients based on genetic findings to improve diagnosis and treatment. Large-scale genetic studies in > 15,000 epilepsy patients have provided deep insights into causative genetic changes. However, the understanding of how these changes link to specific clinical features is lagging behind. This deficit is largely due to the lack of frameworks to analyze large-scale phenotypic data, which dramatically impedes the ability to translate genetic findings into improved treatment. The Human Phenotype Ontology (HPO) has been developed as a curated terminology, with formal semantic relationships between more than 13,500 phenotypic terms, and has been adopted by many diagnostic laboratories, the UK’s 100,000 Genomes Project, the NIH Undiagnosed Diseases Network, the GA4GH, and hundreds of global initiatives. The HPO also contains a rich vocabulary for epilepsy-related phenotypes, which we have developed in prior large research networks and expanded in current projects. However, analytical approaches leveraging this rich data resource have not been implemented so far.
    [Show full text]