KEGG Orthology-Based Annotation of the Predicted
Total Page:16
File Type:pdf, Size:1020Kb
Dunlap et al. BMC Genomics 2013, 14:509 http://www.biomedcentral.com/1471-2164/14/509 DATABASE Open Access KEGG orthology-based annotation of the predicted proteome of Acropora digitifera: ZoophyteBase - an open access and searchable database of a coral genome Walter C Dunlap1,2, Antonio Starcevic4, Damir Baranasic4, Janko Diminic4, Jurica Zucko4, Ranko Gacesa4, Madeleine JH van Oppen1, Daslav Hranueli4, John Cullum5 and Paul F Long2,3* Abstract Background: Contemporary coral reef research has firmly established that a genomic approach is urgently needed to better understand the effects of anthropogenic environmental stress and global climate change on coral holobiont interactions. Here we present KEGG orthology-based annotation of the complete genome sequence of the scleractinian coral Acropora digitifera and provide the first comprehensive view of the genome of a reef-building coral by applying advanced bioinformatics. Description: Sequences from the KEGG database of protein function were used to construct hidden Markov models. These models were used to search the predicted proteome of A. digitifera to establish complete genomic annotation. The annotated dataset is published in ZoophyteBase, an open access format with different options for searching the data. A particularly useful feature is the ability to use a Google-like search engine that links query words to protein attributes. We present features of the annotation that underpin the molecular structure of key processes of coral physiology that include (1) regulatory proteins of symbiosis, (2) planula and early developmental proteins, (3) neural messengers, receptors and sensory proteins, (4) calcification and Ca2+-signalling proteins, (5) plant-derived proteins, (6) proteins of nitrogen metabolism, (7) DNA repair proteins, (8) stress response proteins, (9) antioxidant and redox-protective proteins, (10) proteins of cellular apoptosis, (11) microbial symbioses and pathogenicity proteins, (12) proteins of viral pathogenicity, (13) toxins and venom, (14) proteins of the chemical defensome and (15) coral epigenetics. Conclusions: We advocate that providing annotation in an open-access searchable database available to the public domain will give an unprecedented foundation to interrogate the fundamental molecular structure and interactions of coral symbiosis and allow critical questions to be addressed at the genomic level based on combined aspects of evolutionary, developmental, metabolic, and environmental perspectives. Keywords: Acropora digitifera, KEGG orthology, Database, Annotation, Proteome, Genome, Coral, Symbiosis, Cnidaria * Correspondence: [email protected] 2Institute of Pharmaceutical Science, King’s College London, Franklin-Wilkins Building, 150 Stamford Street, London SE1 9NH, United Kingdom 3Department of Chemistry King’s College London, Franklin-Wilkins Building, 150 Stamford Street, London SE1 9NH, United Kingdom Full list of author information is available at the end of the article © 2013 Dunlap et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Dunlap et al. BMC Genomics 2013, 14:509 Page 2 of 59 http://www.biomedcentral.com/1471-2164/14/509 Background estimate levels of mRNA expression, and by direct analysis All of the reef-building corals (Scleractinia; phylum using next-generation, high-throughput sequencing. How- Cnidaria) that create the vast calcium carbonate de- ever, much of this work has been conducted using the posits of coral reefs have evolved an endosymbiotic aposymbiotic state of pre-settlement coral larvae, so partnership with photosynthetic dinoflagellates of the transcribed genes relevant to metamorphosis and the genus Symbiodinium (Dinophyceae), commonly known cytobiology of the adult polyp are limited to a few recent as zooxanthellae, which reside within the gastrodermal studies [33-36]. The transcriptome additionally does not cells of their scleractinian host [1-3]. Coral-algal symbiosis provide the structural framework and essential regulatory is a cooperative metabolic adaptation necessary for sur- elements of the functional genome for comprehensive vival in the shallow oligotrophic (nutrient-poor) waters of evaluation. Recently, deep metatranscriptomic sequencing tropical and subtropical marine environments [4,5] that of two adult coral holobiomes has been made available drives the productivity of coral reefs [6]. Coral reefs pro- on searchable databases: PocilloporaBase for Pocillopora vide habitat and trophic support for many thousands of damicornis [36] and PcarnBase for Platygyra carnosus marine species, the richness of which rival the biological [37]. In contrast, high-throughput metaproteomic analyses biodiversity of tropical rainforests [7]. Underlying the basic to quantify the product yield of stress-response genes of requirements of corals for growth, reproduction and sur- the coral holobiome are yet to be widely adopted by the vival are special needs to accommodate symbiont-specific coral reef scientific community, despite the proteome be- host recognition, to control innate and responsive im- ing the ultimate measure of the coral phenotype [38,39]. mune systems, and what is likely to emerge from future The early accumulation of transcriptomic data revealed research is the extent to which the host is involved in that a small proportion of coral ESTs matched genes direct regulation of its endosymbiont populations. known previously only from other kingdoms of life, imply- Much is understood about the cellular biology of ing that the ancestral animal genome contained many cnidarian-dinoflagellate symbiosis (reviewed in [8]), genes traditionally regarded as ‘non-animal’ that have been but less is known at the molecular level of coral symbiology. lost from most animal genomes [40]. Furthermore, an un- There is little opposition to the contention that envi- expected revelation from EST data is the greater extent to ronmental and anthropogenic disturbances are causing which coral sequences resemble human genes than those alarming losses to coral reefs ([9] and reference therein). of the Drosophila and Caenorhabditis model invertebrate Threats to productivity are being imposed by the disruption genomes [41,42]. Comparative genomic analysis has of coral symbiosis (apparent as “coral bleaching”)causedin revealed higher genetic divergence and massive gene loss response to increasing thermal stress attributed to global within the ecdysozoan lineages. Hence, many genes warming [10,11], from an increase in stress-related coral assumed to have much later evolutionary origins are likely disease [12-14], from the discharge of domestic and indus- to have been present in an ancestral or early-diverged trial wastes, pollutants from agricultural development and metazoan [43]. While much of the animal kingdom the transport of sediments in terrestrial runoff [15,16], and remains yet to be explored, examples of the metazoan potentially from imminent declines in coral calcification phylum Cnidaria provide a unique insight into the deep owing to rising ocean acidification [17-19]. Accordingly, we evolutionary origins of at least some vertebrate gene fa- requireabetterunderstandingofthemolecularstressre- milies [42]. Thus, the complete genomic sequence of a sponses and adaptive potential of corals. Such information coral is likely to reveal many genes previously assumed to is necessary to predict bleaching events and so better in- be strictly vertebrate innovations. To date, cnidarian ge- form effective management policies for the conservation of nomes have been published for the sea anemone N. coral reef ecosystems [20-24]. vectensis [42] and the hydroid Hydra magnipapillata [44]. To understand how coral holobionts respond to envi- Only the coral genome of Acropora digitifera is available ronmental change at the molecular level, the identification without restriction on use of its published sequence [45], of genes that may respond by transcription to stress is of but the compiled sequence has not been fully annotated. primary importance [25]. Thus, the use of transcriptomic At the time of this writing, the genome assembly of methodologies to identify stress-responsive genes has been Acropora millepora has been released to the public do- highly successful [26-32]. Transcriptome high-throughput main [46], also without full annotation, but an embargo is profiling has allowed changes in gene expression across imposed on use of this data that is highly restrictive to the thousands of genes to be measured simultaneously. Fuel- progress of further studies. Understanding how genomic led by data-generating power, the number of coral based variation affects molecular and organismal biology is the studies utilising transcriptomics to investigate molecular ultimate justification of genome sequencing, and annota- responses to environmental stressors has expanded greatly tion is an essential step in this process. We envisage that by the acquisition of expressed sequence tag (EST) gene li- unrestricted access to annotation of the A. digitifera gen- braries, the fabrication of microarray biochips used to ome will provide an unprecedented foundation to freely Dunlap et al. BMC Genomics 2013, 14:509 Page 3 of 59 http://www.biomedcentral.com/1471-2164/14/509 interrogatethegenericmolecularstructure,possible