ASGARD: an Open-Access Database of Annotated Transcriptomes for Emerging Model Arthropod Species

ASGARD: An Open-Access Database of Annotated Transcriptomes for Emerging Model Arthropod Species The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation Zeng, Victor, and Cassandra G. Extavour. 2012. ASGARD: An open- access database of annotated transcriptomes for emerging model arthropod species. Database 2012:bas048. Published Version doi:10.1093/database/bas048 Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:10482548 Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA Database, Vol. 2012, Article ID bas048, doi:10.1093/database/bas048 ............................................................................................................................................................................................................................................................................................. Database tool ASGARD: an open-access database of Downloaded from annotated transcriptomes for emerging model arthropod species http://database.oxfordjournals.org/ Victor Zeng and Cassandra G. Extavour* Department of Organismic and Evolutionary Biology, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA *Corresponding author: Tel: +1 617 496 1935; Fax: +1 617 496 9507; Email: [email protected] Present address: Victor Zeng, Stylux Incorporated, 25 Stickney Road, Atkinson, NH 03811, USA Submitted 14 August 2012; Revised 1 October 2012; Accepted 16 October 2012 ............................................................................................................................................................................................................................................................................................. at Ernst Mayr Library of the Museum Comp Zoology, Harvard University on November 24, 2012 The increased throughput and decreased cost of next-generation sequencing (NGS) have shifted the bottleneck genomic research from sequencing to annotation, analysis and accessibility. This is particularly challenging for research communities working on organisms that lack the basic infrastructure of a sequenced genome, or an efficient way to utilize whatever sequence data may be available. Here we present a new database, the Assembled Searchable Giant Arthropod Read Database (ASGARD). This database is a repository and search engine for transcriptomic data from arthropods that are of high interest to multiple research communities but currently lack sequenced genomes. We demonstrate the functionality and utility of ASGARD using de novo assembled transcriptomes from the milkweed bug Oncopeltus fasciatus, the cricket Gryllus bimaculatus and the amphipod crustacean Parhyale hawaiensis. We have annotated these transcriptomes to assign putative orthology, coding region determination, protein domain identification and Gene Ontology (GO) term annotation to all possible assembly products. ASGARD allows users to search all assemblies by orthology annotation, GO term annotation or Basic Local Alignment Search Tool. User-friendly features of ASGARD include search term auto-completion sug- gestions based on database content, the ability to download assembly product sequences in FASTA format, direct links to NCBI data for predicted orthologs and graphical representation of the location of protein domains and matches to similar sequences from the NCBI non-redundant database. ASGARD will be a useful repository for transcriptome data from future NGS studies on these and other emerging model arthropods, regardless of sequencing platform, assembly or annotation status. This database thus provides easy, one-stop access to multi-species annotated transcriptome information. We antici- pate that this database will be useful for members of multiple research communities, including developmental biology, physiology, evolutionary biology, ecology, comparative genomics and phylogenomics. Database URL: asgard.rc.fas.harvard.edu ............................................................................................................................................................................................................................................................................................. simultaneous creation of dedicated web interfaces (e.g. 6– Introduction 11), or incorporation of the data into existing community In the early ‘genomic era’ of the late 1990s and early 2000s, databases (e.g. 12), so that users could immediately and the genomes of several long-standing traditional laboratory easily access and search genome sequences. The advent of model organisms were completely sequenced (1–5), which next-generation sequencing (NGS) has further advanced galvanized their respective fields by offering enormous biological research not only in traditional model systems, amounts of new data for analysis. Importantly, the benefi- but also in an increasing number of clades that previously cial effects of these genome projects were maximized by the lacked genomic data (13–22). High-throughput NGS ............................................................................................................................................................................................................................................................................................. ß The Author(s) 2012. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/ licenses/by-nc/3.0/), which permits non-commercial reuse, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected]. Page 1 of 17 (page number not for citation purposes) Database tool Database, Vol. 2012, Article ID bas048, doi:10.1093/database/bas048 ............................................................................................................................................................................................................................................................................................. technology now enables researchers studying non- data from other emerging model arthropods (19, 21, 22, 56, traditional model organisms to obtain genomic or transcrip- V. Zeng, B. Ewen Campben, H.W. Horch et al., submitted for tomic data relatively efficiently and at modest costs. publication). These projects are particularly important for Transcriptome and RNA-Seq data are currently the fas- new model organisms for which functional genetic tech- test growing category of genomic data across many biolo- niques have been developed, as the roles of genes dis- gical research fields (23, 24). However, unlike the covered through NGS can be functionally tested in these pioneering genome sequence projects, these smaller animals. However, even if these data are deposited in the ‘omics’ datasets are usually minimally annotated to meet SRA, as described above, there is typically no public access Downloaded from the needs of a specific research goal, and are rarely avail- provided to search the annotated data. able or searchable in assembled or annotated form. The To address this problem, we have created a searchable NCBI’s Sequence Read Archive (SRA) (25) provides a database of the annotated transcriptomes of three emer- means of archiving data obtained from 454 pyrosequen- ging model arthropods, which provide data for a range of cing, Illumina Genome Analyzer sequencing and other phylogenetic diversity within Pancrustacea. All of these or- http://database.oxfordjournals.org/ NGS platforms. However, it does not allow for deposition ganisms have risen to prominence as emerging model or- or searching of assembled transcriptomes. Basic Local ganisms due to their ease of inbred laboratory cultures, Alignment Search Tool (BLAST) searches of the SRA data year-round embryo collection and gene expression analysis are possible, but only by selecting a single SRA dataset via in situ hybridization and antibody staining. The milk- for a given organism at a time. The commonly used NCBI weed bug Oncopeltus fasciatus (Figure1, left) belongs to BLAST portal (http://blast.ncbi.nlm.nih.gov/Blast.cgi) does the order Hemiptera, the sister order to all holometabolous not include SRA data within the nucleotide collection or insects including Drosophila (55). Determination of gene reference RNA sequences (refseq_rna), although it does function is possible in O. fasciatus using maternal or embry- allow SRA searches as a specialized BLAST option. The tran- onic RNA interference (RNAi) (57–61). The amphipod crust- at Ernst Mayr Library of the Museum Comp Zoology, Harvard University on November 24, 2012 scriptome shotgun assembly (TSA) database (http://www. acean Parhyale hawaiensis (Figure 1, middle) is a member of ncbi.nlm.nih.gov/genbank/tsa/) allows storage of complete the crustacean class Malacostraca and thus serves as a assemblies, but annotation of deposited assemblies is not Pancrustacean outgroup to insects (62). Multiple functional required. As a result, the potential for leveraging the vast genetic tools have been developed for P. hawaiensis, includ- majority of transcriptome data generated are diminished. ing gene knockdown

ASGARD: an Open-Access Database of Annotated Transcriptomes for Emerging Model Arthropod Species

Insecta Diptera) in Freshwater (Excluding Simulidae, Culicidae, Chironomidae, Tipulidae and Tabanidae) Rüdiger Wagner University of Kassel

New Discoveries in New World Atissini (Diptera: Ephydridae)

Catálogo De Los Diptera De España, Portugal Y Andorra (Insecta) Coordinador: Miguel Carles-Tolrá Hjorth-Andersen Edita: Sociedad Entomológica Aragonesa (SEA)

Arthropod Genomics Symposium June 8 – June 11, 2017

Project Plan Research Management Unit National Agricultural Library

Global Diversity of Dipteran Families (Insecta Diptera)

Order Diptera, Family Ephydridae

The Ephydridae of Sardinia ( Diptera)*